linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/10] vfio_pci: wrap pci device as a mediated device
@ 2019-11-21 11:23 Liu Yi L
  2019-11-21 11:23 ` [PATCH v3 01/10] vfio_pci: move vfio_pci_is_vga/vfio_vga_disabled to header Liu Yi L
                   ` (10 more replies)
  0 siblings, 11 replies; 18+ messages in thread
From: Liu Yi L @ 2019-11-21 11:23 UTC (permalink / raw)
  To: alex.williamson, kwankhede
  Cc: kevin.tian, baolu.lu, yi.l.liu, yi.y.sun, joro,
	jean-philippe.brucker, peterx, linux-kernel, kvm

This patchset aims to add a vfio-pci-like meta driver as a demo
user of the vfio changes introduced in "vfio/mdev: IOMMU aware
mediated device" patchset from Baolu Lu. Besides the test purpose,
per Alex's comments, it could also be a good base driver for
experimenting with device specific mdev migration.

Specific interface tested in this proposal:
 *) int mdev_set_iommu_device(struct device *dev,
 				struct device *iommu_device)
    introduced in the patch as below:
    "[PATCH v5 6/8] vfio/mdev: Add iommu related member in mdev_device"

Patch Overview:
 *) patch 1 ~ 7: code refactor for existing vfio-pci module
                 move the common codes from vfio_pci.c to
                 vfio_pci_common.c
 *) patch 8: add protection to perm_bits alloc/free
 *) patch 9: add vfio-mdev-pci sample driver
 *) patch 10: refine the sample driver

Links:
 *) Link of "vfio/mdev: IOMMU aware mediated device"
         https://lwn.net/Articles/780522/
 *) Previous versions:
         Patch v2: https://lkml.org/lkml/2019/9/6/115
         Patch v1: https://www.spinics.net/lists/kvm/msg188952.html
         RFC v3: https://lkml.org/lkml/2019/4/24/495
         RFC v2: https://lkml.org/lkml/2019/3/13/113
         RFC v1: https://lkml.org/lkml/2019/3/4/529
 *) may try it with the codes in below repo
    https://github.com/luxis1999/vfio-mdev-pci-sample-driver.git : v5.4-rc7-pci-mdev

Please feel free give your comments.

Thanks,
Yi Liu

Change log:
  patch v2 -> patch v3:
  - refresh the disable_idle_d3, disable_vga and nointxmask config
    according to user config in device open.
  - add a semaphore around the vfio-pci cap/ecap perm bits allocation/free
  - drop the non-singleton iommu group support to keep it simple as it's
    a sample driver for now.

  patch v1 -> patch v2:
  - the sample driver implementation refined
  - the sample driver can work on non-singleton iommu groups
  - the sample driver can work with vfio-pci, devices from a non-singleton
    group can either be bound to vfio-mdev-pci or vfio-pci, and the
    assignment of this group still follows current vfio assignment rule.

  RFC v3 -> patch v1:
  - split the patchset from 3 patches to 9 patches to better demonstrate
    the changes step by step

  rfc v2->v3:
  - use vfio-mdev-pci instead of vfio-pci-mdev
  - place the new driver under drivers/vfio/pci while define
    Kconfig in samples/Kconfig to clarify it is a sample driver

  rfc v1->v2:
  - instead of adding kernel option to existing vfio-pci
    module in v1, v2 follows Alex's suggestion to add a
    separate vfio-pci-mdev module.
  - new patchset subject: "vfio/pci: wrap pci device as a mediated device"


Alex Williamson (1):
  samples: refine vfio-mdev-pci driver

Liu Yi L (9):
  vfio_pci: move vfio_pci_is_vga/vfio_vga_disabled to header
  vfio_pci: refine user config reference in vfio-pci module
  vfio_pci: refine vfio_pci_driver reference in vfio_pci.c
  vfio_pci: make common functions be extern
  vfio_pci: duplicate vfio_pci.c
  vfio_pci: shrink vfio_pci_common.c
  vfio_pci: shrink vfio_pci.c
  vfio/pci: protect cap/ecap_perm bits alloc/free
  samples: add vfio-mdev-pci driver

 drivers/vfio/pci/Makefile           |    9 +-
 drivers/vfio/pci/vfio_mdev_pci.c    |  430 ++++++++++
 drivers/vfio/pci/vfio_pci.c         | 1460 +---------------------------------
 drivers/vfio/pci/vfio_pci_common.c  | 1471 +++++++++++++++++++++++++++++++++++
 drivers/vfio/pci/vfio_pci_config.c  |   33 +-
 drivers/vfio/pci/vfio_pci_private.h |   39 +
 samples/Kconfig                     |   11 +
 7 files changed, 2000 insertions(+), 1453 deletions(-)
 create mode 100644 drivers/vfio/pci/vfio_mdev_pci.c
 create mode 100644 drivers/vfio/pci/vfio_pci_common.c

-- 
2.7.4


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 01/10] vfio_pci: move vfio_pci_is_vga/vfio_vga_disabled to header
  2019-11-21 11:23 [PATCH v3 00/10] vfio_pci: wrap pci device as a mediated device Liu Yi L
@ 2019-11-21 11:23 ` Liu Yi L
  2019-11-21 11:23 ` [PATCH v3 02/10] vfio_pci: refine user config reference in vfio-pci module Liu Yi L
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Liu Yi L @ 2019-11-21 11:23 UTC (permalink / raw)
  To: alex.williamson, kwankhede
  Cc: kevin.tian, baolu.lu, yi.l.liu, yi.y.sun, joro,
	jean-philippe.brucker, peterx, linux-kernel, kvm

This patch fix an issue regards to always_inline. e.g.:

"error: inlining failed in call to always_inline ‘vfio_pci_is_vga’:
function body not available".

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 drivers/vfio/pci/vfio_pci.c         | 14 --------------
 drivers/vfio/pci/vfio_pci_private.h | 14 ++++++++++++++
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 0220616..79460c3 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -54,15 +54,6 @@ module_param(disable_idle_d3, bool, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(disable_idle_d3,
 		 "Disable using the PCI D3 low power state for idle, unused devices");
 
-static inline bool vfio_vga_disabled(void)
-{
-#ifdef CONFIG_VFIO_PCI_VGA
-	return disable_vga;
-#else
-	return true;
-#endif
-}
-
 /*
  * Our VGA arbiter participation is limited since we don't know anything
  * about the device itself.  However, if the device is the only VGA device
@@ -102,11 +93,6 @@ static unsigned int vfio_pci_set_vga_decode(void *opaque, bool single_vga)
 	return decodes;
 }
 
-static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
-{
-	return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA;
-}
-
 static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev)
 {
 	struct resource *res;
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index ee6ee91..f12d92c 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -130,6 +130,20 @@ struct vfio_pci_device {
 #define is_irq_none(vdev) (!(is_intx(vdev) || is_msi(vdev) || is_msix(vdev)))
 #define irq_is(vdev, type) (vdev->irq_type == type)
 
+static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
+{
+	return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA;
+}
+
+static inline bool vfio_vga_disabled(void)
+{
+#ifdef CONFIG_VFIO_PCI_VGA
+	return disable_vga;
+#else
+	return true;
+#endif
+}
+
 extern void vfio_pci_intx_mask(struct vfio_pci_device *vdev);
 extern void vfio_pci_intx_unmask(struct vfio_pci_device *vdev);
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 02/10] vfio_pci: refine user config reference in vfio-pci module
  2019-11-21 11:23 [PATCH v3 00/10] vfio_pci: wrap pci device as a mediated device Liu Yi L
  2019-11-21 11:23 ` [PATCH v3 01/10] vfio_pci: move vfio_pci_is_vga/vfio_vga_disabled to header Liu Yi L
@ 2019-11-21 11:23 ` Liu Yi L
  2019-11-21 11:23 ` [PATCH v3 03/10] vfio_pci: refine vfio_pci_driver reference in vfio_pci.c Liu Yi L
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Liu Yi L @ 2019-11-21 11:23 UTC (permalink / raw)
  To: alex.williamson, kwankhede
  Cc: kevin.tian, baolu.lu, yi.l.liu, yi.y.sun, joro,
	jean-philippe.brucker, peterx, linux-kernel, kvm

This patch adds three fields in struct vfio_pci_device to pass the user
configs of vfio-pci module to some functions which could be common in
future usage.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 drivers/vfio/pci/vfio_pci.c         | 33 ++++++++++++++++++++++++---------
 drivers/vfio/pci/vfio_pci_private.h | 12 ++++++++++--
 2 files changed, 34 insertions(+), 11 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 79460c3..b04e43a 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -69,7 +69,8 @@ static unsigned int vfio_pci_set_vga_decode(void *opaque, bool single_vga)
 	unsigned char max_busnr;
 	unsigned int decodes;
 
-	if (single_vga || !vfio_vga_disabled() || pci_is_root_bus(pdev->bus))
+	if (single_vga || !vfio_vga_disabled(vdev) ||
+		pci_is_root_bus(pdev->bus))
 		return VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM |
 		       VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM;
 
@@ -273,7 +274,7 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
 	if (!vdev->pci_saved_state)
 		pci_dbg(pdev, "%s: Couldn't store saved state\n", __func__);
 
-	if (likely(!nointxmask)) {
+	if (likely(!vdev->nointxmask)) {
 		if (vfio_pci_nointx(pdev)) {
 			pci_info(pdev, "Masking broken INTx support\n");
 			vdev->nointx = true;
@@ -310,7 +311,7 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
 	} else
 		vdev->msix_bar = 0xFF;
 
-	if (!vfio_vga_disabled() && vfio_pci_is_vga(pdev))
+	if (!vfio_vga_disabled(vdev) && vfio_pci_is_vga(pdev))
 		vdev->has_vga = true;
 
 
@@ -445,10 +446,17 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
 
 	vfio_pci_try_bus_reset(vdev);
 
-	if (!disable_idle_d3)
+	if (!vdev->disable_idle_d3)
 		vfio_pci_set_power_state(vdev, PCI_D3hot);
 }
 
+void vfio_pci_refresh_config(struct vfio_pci_device *vdev,
+			bool nointxmask, bool disable_idle_d3)
+{
+	vdev->nointxmask = nointxmask;
+	vdev->disable_idle_d3 = disable_idle_d3;
+}
+
 static void vfio_pci_release(void *device_data)
 {
 	struct vfio_pci_device *vdev = device_data;
@@ -473,6 +481,8 @@ static int vfio_pci_open(void *device_data)
 	if (!try_module_get(THIS_MODULE))
 		return -ENODEV;
 
+	vfio_pci_refresh_config(vdev, nointxmask, disable_idle_d3);
+
 	mutex_lock(&vdev->reflck->lock);
 
 	if (!vdev->refcnt) {
@@ -1313,6 +1323,11 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	spin_lock_init(&vdev->irqlock);
 	mutex_init(&vdev->ioeventfds_lock);
 	INIT_LIST_HEAD(&vdev->ioeventfds_list);
+	vdev->nointxmask = nointxmask;
+#ifdef CONFIG_VFIO_PCI_VGA
+	vdev->disable_vga = disable_vga;
+#endif
+	vdev->disable_idle_d3 = disable_idle_d3;
 
 	ret = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, vdev);
 	if (ret) {
@@ -1337,7 +1352,7 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 
 	vfio_pci_probe_power_state(vdev);
 
-	if (!disable_idle_d3) {
+	if (!vdev->disable_idle_d3) {
 		/*
 		 * pci-core sets the device power state to an unknown value at
 		 * bootup and after being removed from a driver.  The only
@@ -1368,7 +1383,7 @@ static void vfio_pci_remove(struct pci_dev *pdev)
 	kfree(vdev->region);
 	mutex_destroy(&vdev->ioeventfds_lock);
 
-	if (!disable_idle_d3)
+	if (!vdev->disable_idle_d3)
 		vfio_pci_set_power_state(vdev, PCI_D0);
 
 	kfree(vdev->pm_save);
@@ -1603,7 +1618,7 @@ static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev)
 		if (!ret) {
 			tmp->needs_reset = false;
 
-			if (tmp != vdev && !disable_idle_d3)
+			if (tmp != vdev && !tmp->disable_idle_d3)
 				vfio_pci_set_power_state(tmp, PCI_D3hot);
 		}
 
@@ -1619,7 +1634,7 @@ static void __exit vfio_pci_cleanup(void)
 	vfio_pci_uninit_perm_bits();
 }
 
-static void __init vfio_pci_fill_ids(void)
+static void __init vfio_pci_fill_ids(char *ids)
 {
 	char *p, *id;
 	int rc;
@@ -1674,7 +1689,7 @@ static int __init vfio_pci_init(void)
 	if (ret)
 		goto out_driver;
 
-	vfio_pci_fill_ids();
+	vfio_pci_fill_ids(ids);
 
 	return 0;
 
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index f12d92c..eae8d94 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -122,6 +122,11 @@ struct vfio_pci_device {
 	struct list_head	dummy_resources_list;
 	struct mutex		ioeventfds_lock;
 	struct list_head	ioeventfds_list;
+	bool			nointxmask;
+#ifdef CONFIG_VFIO_PCI_VGA
+	bool			disable_vga;
+#endif
+	bool			disable_idle_d3;
 };
 
 #define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX)
@@ -135,15 +140,18 @@ static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
 	return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA;
 }
 
-static inline bool vfio_vga_disabled(void)
+static inline bool vfio_vga_disabled(struct vfio_pci_device *vdev)
 {
 #ifdef CONFIG_VFIO_PCI_VGA
-	return disable_vga;
+	return vdev->disable_vga;
 #else
 	return true;
 #endif
 }
 
+extern void vfio_pci_refresh_config(struct vfio_pci_device *vdev,
+				bool nointxmask, bool disable_idle_d3);
+
 extern void vfio_pci_intx_mask(struct vfio_pci_device *vdev);
 extern void vfio_pci_intx_unmask(struct vfio_pci_device *vdev);
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 03/10] vfio_pci: refine vfio_pci_driver reference in vfio_pci.c
  2019-11-21 11:23 [PATCH v3 00/10] vfio_pci: wrap pci device as a mediated device Liu Yi L
  2019-11-21 11:23 ` [PATCH v3 01/10] vfio_pci: move vfio_pci_is_vga/vfio_vga_disabled to header Liu Yi L
  2019-11-21 11:23 ` [PATCH v3 02/10] vfio_pci: refine user config reference in vfio-pci module Liu Yi L
@ 2019-11-21 11:23 ` Liu Yi L
  2019-12-15 22:46   ` Alex Williamson
  2019-11-21 11:23 ` [PATCH v3 04/10] vfio_pci: make common functions be extern Liu Yi L
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 18+ messages in thread
From: Liu Yi L @ 2019-11-21 11:23 UTC (permalink / raw)
  To: alex.williamson, kwankhede
  Cc: kevin.tian, baolu.lu, yi.l.liu, yi.y.sun, joro,
	jean-philippe.brucker, peterx, linux-kernel, kvm

This patch replaces the vfio_pci_driver reference in vfio_pci.c with
pci_dev_driver(vdev->pdev) which is more helpful to make the functions
be generic to module types.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 drivers/vfio/pci/vfio_pci.c | 33 ++++++++++++++++++---------------
 1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index b04e43a..2096e66 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -1460,24 +1460,25 @@ static void vfio_pci_reflck_get(struct vfio_pci_reflck *reflck)
 
 static int vfio_pci_reflck_find(struct pci_dev *pdev, void *data)
 {
-	struct vfio_pci_reflck **preflck = data;
+	struct vfio_pci_device *vdev = data;
+	struct vfio_pci_reflck **preflck = &vdev->reflck;
 	struct vfio_device *device;
-	struct vfio_pci_device *vdev;
+	struct vfio_pci_device *tmp;
 
 	device = vfio_device_get_from_dev(&pdev->dev);
 	if (!device)
 		return 0;
 
-	if (pci_dev_driver(pdev) != &vfio_pci_driver) {
+	if (pci_dev_driver(pdev) != pci_dev_driver(vdev->pdev)) {
 		vfio_device_put(device);
 		return 0;
 	}
 
-	vdev = vfio_device_data(device);
+	tmp = vfio_device_data(device);
 
-	if (vdev->reflck) {
-		vfio_pci_reflck_get(vdev->reflck);
-		*preflck = vdev->reflck;
+	if (tmp->reflck) {
+		vfio_pci_reflck_get(tmp->reflck);
+		*preflck = tmp->reflck;
 		vfio_device_put(device);
 		return 1;
 	}
@@ -1494,7 +1495,7 @@ static int vfio_pci_reflck_attach(struct vfio_pci_device *vdev)
 
 	if (pci_is_root_bus(vdev->pdev->bus) ||
 	    vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_reflck_find,
-					  &vdev->reflck, slot) <= 0)
+					  vdev, slot) <= 0)
 		vdev->reflck = vfio_pci_reflck_alloc();
 
 	mutex_unlock(&reflck_lock);
@@ -1519,6 +1520,7 @@ static void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck)
 
 struct vfio_devices {
 	struct vfio_device **devices;
+	struct vfio_pci_device *vdev;
 	int cur_index;
 	int max_index;
 };
@@ -1527,7 +1529,7 @@ static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data)
 {
 	struct vfio_devices *devs = data;
 	struct vfio_device *device;
-	struct vfio_pci_device *vdev;
+	struct vfio_pci_device *tmp;
 
 	if (devs->cur_index == devs->max_index)
 		return -ENOSPC;
@@ -1536,15 +1538,15 @@ static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data)
 	if (!device)
 		return -EINVAL;
 
-	if (pci_dev_driver(pdev) != &vfio_pci_driver) {
+	if (pci_dev_driver(pdev) != pci_dev_driver(devs->vdev->pdev)) {
 		vfio_device_put(device);
 		return -EBUSY;
 	}
 
-	vdev = vfio_device_data(device);
+	tmp = vfio_device_data(device);
 
 	/* Fault if the device is not unused */
-	if (vdev->refcnt) {
+	if (tmp->refcnt) {
 		vfio_device_put(device);
 		return -EBUSY;
 	}
@@ -1590,6 +1592,7 @@ static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev)
 	if (!devs.devices)
 		return;
 
+	devs.vdev = vdev;
 	if (vfio_pci_for_each_slot_or_bus(vdev->pdev,
 					  vfio_pci_get_unused_devs,
 					  &devs, slot))
@@ -1634,7 +1637,7 @@ static void __exit vfio_pci_cleanup(void)
 	vfio_pci_uninit_perm_bits();
 }
 
-static void __init vfio_pci_fill_ids(char *ids)
+static void __init vfio_pci_fill_ids(char *ids, struct pci_driver *driver)
 {
 	char *p, *id;
 	int rc;
@@ -1662,7 +1665,7 @@ static void __init vfio_pci_fill_ids(char *ids)
 			continue;
 		}
 
-		rc = pci_add_dynid(&vfio_pci_driver, vendor, device,
+		rc = pci_add_dynid(driver, vendor, device,
 				   subvendor, subdevice, class, class_mask, 0);
 		if (rc)
 			pr_warn("failed to add dynamic id [%04x:%04x[%04x:%04x]] class %#08x/%08x (%d)\n",
@@ -1689,7 +1692,7 @@ static int __init vfio_pci_init(void)
 	if (ret)
 		goto out_driver;
 
-	vfio_pci_fill_ids(ids);
+	vfio_pci_fill_ids(ids, &vfio_pci_driver);
 
 	return 0;
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 04/10] vfio_pci: make common functions be extern
  2019-11-21 11:23 [PATCH v3 00/10] vfio_pci: wrap pci device as a mediated device Liu Yi L
                   ` (2 preceding siblings ...)
  2019-11-21 11:23 ` [PATCH v3 03/10] vfio_pci: refine vfio_pci_driver reference in vfio_pci.c Liu Yi L
@ 2019-11-21 11:23 ` Liu Yi L
  2019-11-21 11:23 ` [PATCH v3 05/10] vfio_pci: duplicate vfio_pci.c Liu Yi L
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Liu Yi L @ 2019-11-21 11:23 UTC (permalink / raw)
  To: alex.williamson, kwankhede
  Cc: kevin.tian, baolu.lu, yi.l.liu, yi.y.sun, joro,
	jean-philippe.brucker, peterx, linux-kernel, kvm

This patch makes the common functions (module agnostic functions) in
vfio_pci.c to extern. So that such functions could be moved to a common
source file.
  *) vfio_pci_set_vga_decode
  *) vfio_pci_enable
  *) vfio_pci_disable
  *) vfio_pci_ioctl
  *) vfio_pci_read
  *) vfio_pci_write
  *) vfio_pci_mmap
  *) vfio_pci_request
  *) vfio_pci_fill_ids
  *) vfio_pci_reflck_attach
  *) vfio_pci_reflck_put
  *) vfio_pci_probe_power_state

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 drivers/vfio/pci/vfio_pci.c         | 30 +++++++++++++-----------------
 drivers/vfio/pci/vfio_pci_private.h | 15 +++++++++++++++
 2 files changed, 28 insertions(+), 17 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 2096e66..80a6b15 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -62,7 +62,7 @@ MODULE_PARM_DESC(disable_idle_d3,
  * has no way to get to it and routing can be disabled externally at the
  * bridge.
  */
-static unsigned int vfio_pci_set_vga_decode(void *opaque, bool single_vga)
+unsigned int vfio_pci_set_vga_decode(void *opaque, bool single_vga)
 {
 	struct vfio_pci_device *vdev = opaque;
 	struct pci_dev *tmp = NULL, *pdev = vdev->pdev;
@@ -163,7 +163,6 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev)
 }
 
 static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev);
-static void vfio_pci_disable(struct vfio_pci_device *vdev);
 
 /*
  * INTx masking requires the ability to disable INTx signaling via PCI_COMMAND
@@ -194,7 +193,7 @@ static bool vfio_pci_nointx(struct pci_dev *pdev)
 	return false;
 }
 
-static void vfio_pci_probe_power_state(struct vfio_pci_device *vdev)
+void vfio_pci_probe_power_state(struct vfio_pci_device *vdev)
 {
 	struct pci_dev *pdev = vdev->pdev;
 	u16 pmcsr;
@@ -245,7 +244,7 @@ int vfio_pci_set_power_state(struct vfio_pci_device *vdev, pci_power_t state)
 	return ret;
 }
 
-static int vfio_pci_enable(struct vfio_pci_device *vdev)
+int vfio_pci_enable(struct vfio_pci_device *vdev)
 {
 	struct pci_dev *pdev = vdev->pdev;
 	int ret;
@@ -352,7 +351,7 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
 	return ret;
 }
 
-static void vfio_pci_disable(struct vfio_pci_device *vdev)
+void vfio_pci_disable(struct vfio_pci_device *vdev)
 {
 	struct pci_dev *pdev = vdev->pdev;
 	struct vfio_pci_dummy_resource *dummy_res, *tmp;
@@ -684,8 +683,8 @@ int vfio_pci_register_dev_region(struct vfio_pci_device *vdev,
 	return 0;
 }
 
-static long vfio_pci_ioctl(void *device_data,
-			   unsigned int cmd, unsigned long arg)
+long vfio_pci_ioctl(void *device_data,
+		   unsigned int cmd, unsigned long arg)
 {
 	struct vfio_pci_device *vdev = device_data;
 	unsigned long minsz;
@@ -1170,7 +1169,7 @@ static ssize_t vfio_pci_rw(void *device_data, char __user *buf,
 	return -EINVAL;
 }
 
-static ssize_t vfio_pci_read(void *device_data, char __user *buf,
+ssize_t vfio_pci_read(void *device_data, char __user *buf,
 			     size_t count, loff_t *ppos)
 {
 	if (!count)
@@ -1179,7 +1178,7 @@ static ssize_t vfio_pci_read(void *device_data, char __user *buf,
 	return vfio_pci_rw(device_data, buf, count, ppos, false);
 }
 
-static ssize_t vfio_pci_write(void *device_data, const char __user *buf,
+ssize_t vfio_pci_write(void *device_data, const char __user *buf,
 			      size_t count, loff_t *ppos)
 {
 	if (!count)
@@ -1188,7 +1187,7 @@ static ssize_t vfio_pci_write(void *device_data, const char __user *buf,
 	return vfio_pci_rw(device_data, (char __user *)buf, count, ppos, true);
 }
 
-static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
+int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
 {
 	struct vfio_pci_device *vdev = device_data;
 	struct pci_dev *pdev = vdev->pdev;
@@ -1250,7 +1249,7 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
 			       req_len, vma->vm_page_prot);
 }
 
-static void vfio_pci_request(void *device_data, unsigned int count)
+void vfio_pci_request(void *device_data, unsigned int count)
 {
 	struct vfio_pci_device *vdev = device_data;
 	struct pci_dev *pdev = vdev->pdev;
@@ -1282,9 +1281,6 @@ static const struct vfio_device_ops vfio_pci_ops = {
 	.request	= vfio_pci_request,
 };
 
-static int vfio_pci_reflck_attach(struct vfio_pci_device *vdev);
-static void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck);
-
 static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
 	struct vfio_pci_device *vdev;
@@ -1487,7 +1483,7 @@ static int vfio_pci_reflck_find(struct pci_dev *pdev, void *data)
 	return 0;
 }
 
-static int vfio_pci_reflck_attach(struct vfio_pci_device *vdev)
+int vfio_pci_reflck_attach(struct vfio_pci_device *vdev)
 {
 	bool slot = !pci_probe_reset_slot(vdev->pdev->slot);
 
@@ -1513,7 +1509,7 @@ static void vfio_pci_reflck_release(struct kref *kref)
 	mutex_unlock(&reflck_lock);
 }
 
-static void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck)
+void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck)
 {
 	kref_put_mutex(&reflck->kref, vfio_pci_reflck_release, &reflck_lock);
 }
@@ -1637,7 +1633,7 @@ static void __exit vfio_pci_cleanup(void)
 	vfio_pci_uninit_perm_bits();
 }
 
-static void __init vfio_pci_fill_ids(char *ids, struct pci_driver *driver)
+void __init vfio_pci_fill_ids(char *ids, struct pci_driver *driver)
 {
 	char *p, *id;
 	int rc;
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index eae8d94..670268e 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -185,6 +185,21 @@ extern int vfio_pci_register_dev_region(struct vfio_pci_device *vdev,
 
 extern int vfio_pci_set_power_state(struct vfio_pci_device *vdev,
 				    pci_power_t state);
+extern unsigned int vfio_pci_set_vga_decode(void *opaque, bool single_vga);
+extern int vfio_pci_enable(struct vfio_pci_device *vdev);
+extern void vfio_pci_disable(struct vfio_pci_device *vdev);
+extern long vfio_pci_ioctl(void *device_data,
+			unsigned int cmd, unsigned long arg);
+extern ssize_t vfio_pci_read(void *device_data, char __user *buf,
+			size_t count, loff_t *ppos);
+extern ssize_t vfio_pci_write(void *device_data, const char __user *buf,
+			size_t count, loff_t *ppos);
+extern int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma);
+extern void vfio_pci_request(void *device_data, unsigned int count);
+extern void vfio_pci_fill_ids(char *ids, struct pci_driver *driver);
+extern int vfio_pci_reflck_attach(struct vfio_pci_device *vdev);
+extern void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck);
+extern void vfio_pci_probe_power_state(struct vfio_pci_device *vdev);
 
 #ifdef CONFIG_VFIO_PCI_IGD
 extern int vfio_pci_igd_init(struct vfio_pci_device *vdev);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 05/10] vfio_pci: duplicate vfio_pci.c
  2019-11-21 11:23 [PATCH v3 00/10] vfio_pci: wrap pci device as a mediated device Liu Yi L
                   ` (3 preceding siblings ...)
  2019-11-21 11:23 ` [PATCH v3 04/10] vfio_pci: make common functions be extern Liu Yi L
@ 2019-11-21 11:23 ` Liu Yi L
  2019-11-21 11:23 ` [PATCH v3 06/10] vfio_pci: shrink vfio_pci_common.c Liu Yi L
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Liu Yi L @ 2019-11-21 11:23 UTC (permalink / raw)
  To: alex.williamson, kwankhede
  Cc: kevin.tian, baolu.lu, yi.l.liu, yi.y.sun, joro,
	jean-philippe.brucker, peterx, linux-kernel, kvm

This patch has no code change, just a file copy. In following patches,
vfio_pci_common.c will be modified to only include the common functions
and related static functions in original vfio_pci.c. Meanwhile, vfio_pci.c
will be modified to only include vfio-pci module specific codes.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 drivers/vfio/pci/vfio_pci_common.c | 1706 ++++++++++++++++++++++++++++++++++++
 1 file changed, 1706 insertions(+)
 create mode 100644 drivers/vfio/pci/vfio_pci_common.c

diff --git a/drivers/vfio/pci/vfio_pci_common.c b/drivers/vfio/pci/vfio_pci_common.c
new file mode 100644
index 0000000..80a6b15
--- /dev/null
+++ b/drivers/vfio/pci/vfio_pci_common.c
@@ -0,0 +1,1706 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
+ *     Author: Alex Williamson <alex.williamson@redhat.com>
+ *
+ * Derived from original vfio:
+ * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
+ * Author: Tom Lyon, pugs@cisco.com
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#define dev_fmt pr_fmt
+
+#include <linux/device.h>
+#include <linux/eventfd.h>
+#include <linux/file.h>
+#include <linux/interrupt.h>
+#include <linux/iommu.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/pci.h>
+#include <linux/pm_runtime.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <linux/uaccess.h>
+#include <linux/vfio.h>
+#include <linux/vgaarb.h>
+#include <linux/nospec.h>
+
+#include "vfio_pci_private.h"
+
+#define DRIVER_VERSION  "0.2"
+#define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
+#define DRIVER_DESC     "VFIO PCI - User Level meta-driver"
+
+static char ids[1024] __initdata;
+module_param_string(ids, ids, sizeof(ids), 0);
+MODULE_PARM_DESC(ids, "Initial PCI IDs to add to the vfio driver, format is \"vendor:device[:subvendor[:subdevice[:class[:class_mask]]]]\" and multiple comma separated entries can be specified");
+
+static bool nointxmask;
+module_param_named(nointxmask, nointxmask, bool, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(nointxmask,
+		  "Disable support for PCI 2.3 style INTx masking.  If this resolves problems for specific devices, report lspci -vvvxxx to linux-pci@vger.kernel.org so the device can be fixed automatically via the broken_intx_masking flag.");
+
+#ifdef CONFIG_VFIO_PCI_VGA
+static bool disable_vga;
+module_param(disable_vga, bool, S_IRUGO);
+MODULE_PARM_DESC(disable_vga, "Disable VGA resource access through vfio-pci");
+#endif
+
+static bool disable_idle_d3;
+module_param(disable_idle_d3, bool, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(disable_idle_d3,
+		 "Disable using the PCI D3 low power state for idle, unused devices");
+
+/*
+ * Our VGA arbiter participation is limited since we don't know anything
+ * about the device itself.  However, if the device is the only VGA device
+ * downstream of a bridge and VFIO VGA support is disabled, then we can
+ * safely return legacy VGA IO and memory as not decoded since the user
+ * has no way to get to it and routing can be disabled externally at the
+ * bridge.
+ */
+unsigned int vfio_pci_set_vga_decode(void *opaque, bool single_vga)
+{
+	struct vfio_pci_device *vdev = opaque;
+	struct pci_dev *tmp = NULL, *pdev = vdev->pdev;
+	unsigned char max_busnr;
+	unsigned int decodes;
+
+	if (single_vga || !vfio_vga_disabled(vdev) ||
+		pci_is_root_bus(pdev->bus))
+		return VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM |
+		       VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM;
+
+	max_busnr = pci_bus_max_busnr(pdev->bus);
+	decodes = VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM;
+
+	while ((tmp = pci_get_class(PCI_CLASS_DISPLAY_VGA << 8, tmp)) != NULL) {
+		if (tmp == pdev ||
+		    pci_domain_nr(tmp->bus) != pci_domain_nr(pdev->bus) ||
+		    pci_is_root_bus(tmp->bus))
+			continue;
+
+		if (tmp->bus->number >= pdev->bus->number &&
+		    tmp->bus->number <= max_busnr) {
+			pci_dev_put(tmp);
+			decodes |= VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM;
+			break;
+		}
+	}
+
+	return decodes;
+}
+
+static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev)
+{
+	struct resource *res;
+	int bar;
+	struct vfio_pci_dummy_resource *dummy_res;
+
+	INIT_LIST_HEAD(&vdev->dummy_resources_list);
+
+	for (bar = PCI_STD_RESOURCES; bar <= PCI_STD_RESOURCE_END; bar++) {
+		res = vdev->pdev->resource + bar;
+
+		if (!IS_ENABLED(CONFIG_VFIO_PCI_MMAP))
+			goto no_mmap;
+
+		if (!(res->flags & IORESOURCE_MEM))
+			goto no_mmap;
+
+		/*
+		 * The PCI core shouldn't set up a resource with a
+		 * type but zero size. But there may be bugs that
+		 * cause us to do that.
+		 */
+		if (!resource_size(res))
+			goto no_mmap;
+
+		if (resource_size(res) >= PAGE_SIZE) {
+			vdev->bar_mmap_supported[bar] = true;
+			continue;
+		}
+
+		if (!(res->start & ~PAGE_MASK)) {
+			/*
+			 * Add a dummy resource to reserve the remainder
+			 * of the exclusive page in case that hot-add
+			 * device's bar is assigned into it.
+			 */
+			dummy_res = kzalloc(sizeof(*dummy_res), GFP_KERNEL);
+			if (dummy_res == NULL)
+				goto no_mmap;
+
+			dummy_res->resource.name = "vfio sub-page reserved";
+			dummy_res->resource.start = res->end + 1;
+			dummy_res->resource.end = res->start + PAGE_SIZE - 1;
+			dummy_res->resource.flags = res->flags;
+			if (request_resource(res->parent,
+						&dummy_res->resource)) {
+				kfree(dummy_res);
+				goto no_mmap;
+			}
+			dummy_res->index = bar;
+			list_add(&dummy_res->res_next,
+					&vdev->dummy_resources_list);
+			vdev->bar_mmap_supported[bar] = true;
+			continue;
+		}
+		/*
+		 * Here we don't handle the case when the BAR is not page
+		 * aligned because we can't expect the BAR will be
+		 * assigned into the same location in a page in guest
+		 * when we passthrough the BAR. And it's hard to access
+		 * this BAR in userspace because we have no way to get
+		 * the BAR's location in a page.
+		 */
+no_mmap:
+		vdev->bar_mmap_supported[bar] = false;
+	}
+}
+
+static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev);
+
+/*
+ * INTx masking requires the ability to disable INTx signaling via PCI_COMMAND
+ * _and_ the ability detect when the device is asserting INTx via PCI_STATUS.
+ * If a device implements the former but not the latter we would typically
+ * expect broken_intx_masking be set and require an exclusive interrupt.
+ * However since we do have control of the device's ability to assert INTx,
+ * we can instead pretend that the device does not implement INTx, virtualizing
+ * the pin register to report zero and maintaining DisINTx set on the host.
+ */
+static bool vfio_pci_nointx(struct pci_dev *pdev)
+{
+	switch (pdev->vendor) {
+	case PCI_VENDOR_ID_INTEL:
+		switch (pdev->device) {
+		/* All i40e (XL710/X710/XXV710) 10/20/25/40GbE NICs */
+		case 0x1572:
+		case 0x1574:
+		case 0x1580 ... 0x1581:
+		case 0x1583 ... 0x158b:
+		case 0x37d0 ... 0x37d2:
+			return true;
+		default:
+			return false;
+		}
+	}
+
+	return false;
+}
+
+void vfio_pci_probe_power_state(struct vfio_pci_device *vdev)
+{
+	struct pci_dev *pdev = vdev->pdev;
+	u16 pmcsr;
+
+	if (!pdev->pm_cap)
+		return;
+
+	pci_read_config_word(pdev, pdev->pm_cap + PCI_PM_CTRL, &pmcsr);
+
+	vdev->needs_pm_restore = !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET);
+}
+
+/*
+ * pci_set_power_state() wrapper handling devices which perform a soft reset on
+ * D3->D0 transition.  Save state prior to D0/1/2->D3, stash it on the vdev,
+ * restore when returned to D0.  Saved separately from pci_saved_state for use
+ * by PM capability emulation and separately from pci_dev internal saved state
+ * to avoid it being overwritten and consumed around other resets.
+ */
+int vfio_pci_set_power_state(struct vfio_pci_device *vdev, pci_power_t state)
+{
+	struct pci_dev *pdev = vdev->pdev;
+	bool needs_restore = false, needs_save = false;
+	int ret;
+
+	if (vdev->needs_pm_restore) {
+		if (pdev->current_state < PCI_D3hot && state >= PCI_D3hot) {
+			pci_save_state(pdev);
+			needs_save = true;
+		}
+
+		if (pdev->current_state >= PCI_D3hot && state <= PCI_D0)
+			needs_restore = true;
+	}
+
+	ret = pci_set_power_state(pdev, state);
+
+	if (!ret) {
+		/* D3 might be unsupported via quirk, skip unless in D3 */
+		if (needs_save && pdev->current_state >= PCI_D3hot) {
+			vdev->pm_save = pci_store_saved_state(pdev);
+		} else if (needs_restore) {
+			pci_load_and_free_saved_state(pdev, &vdev->pm_save);
+			pci_restore_state(pdev);
+		}
+	}
+
+	return ret;
+}
+
+int vfio_pci_enable(struct vfio_pci_device *vdev)
+{
+	struct pci_dev *pdev = vdev->pdev;
+	int ret;
+	u16 cmd;
+	u8 msix_pos;
+
+	vfio_pci_set_power_state(vdev, PCI_D0);
+
+	/* Don't allow our initial saved state to include busmaster */
+	pci_clear_master(pdev);
+
+	ret = pci_enable_device(pdev);
+	if (ret)
+		return ret;
+
+	/* If reset fails because of the device lock, fail this path entirely */
+	ret = pci_try_reset_function(pdev);
+	if (ret == -EAGAIN) {
+		pci_disable_device(pdev);
+		return ret;
+	}
+
+	vdev->reset_works = !ret;
+	pci_save_state(pdev);
+	vdev->pci_saved_state = pci_store_saved_state(pdev);
+	if (!vdev->pci_saved_state)
+		pci_dbg(pdev, "%s: Couldn't store saved state\n", __func__);
+
+	if (likely(!vdev->nointxmask)) {
+		if (vfio_pci_nointx(pdev)) {
+			pci_info(pdev, "Masking broken INTx support\n");
+			vdev->nointx = true;
+			pci_intx(pdev, 0);
+		} else
+			vdev->pci_2_3 = pci_intx_mask_supported(pdev);
+	}
+
+	pci_read_config_word(pdev, PCI_COMMAND, &cmd);
+	if (vdev->pci_2_3 && (cmd & PCI_COMMAND_INTX_DISABLE)) {
+		cmd &= ~PCI_COMMAND_INTX_DISABLE;
+		pci_write_config_word(pdev, PCI_COMMAND, cmd);
+	}
+
+	ret = vfio_config_init(vdev);
+	if (ret) {
+		kfree(vdev->pci_saved_state);
+		vdev->pci_saved_state = NULL;
+		pci_disable_device(pdev);
+		return ret;
+	}
+
+	msix_pos = pdev->msix_cap;
+	if (msix_pos) {
+		u16 flags;
+		u32 table;
+
+		pci_read_config_word(pdev, msix_pos + PCI_MSIX_FLAGS, &flags);
+		pci_read_config_dword(pdev, msix_pos + PCI_MSIX_TABLE, &table);
+
+		vdev->msix_bar = table & PCI_MSIX_TABLE_BIR;
+		vdev->msix_offset = table & PCI_MSIX_TABLE_OFFSET;
+		vdev->msix_size = ((flags & PCI_MSIX_FLAGS_QSIZE) + 1) * 16;
+	} else
+		vdev->msix_bar = 0xFF;
+
+	if (!vfio_vga_disabled(vdev) && vfio_pci_is_vga(pdev))
+		vdev->has_vga = true;
+
+
+	if (vfio_pci_is_vga(pdev) &&
+	    pdev->vendor == PCI_VENDOR_ID_INTEL &&
+	    IS_ENABLED(CONFIG_VFIO_PCI_IGD)) {
+		ret = vfio_pci_igd_init(vdev);
+		if (ret) {
+			pci_warn(pdev, "Failed to setup Intel IGD regions\n");
+			goto disable_exit;
+		}
+	}
+
+	if (pdev->vendor == PCI_VENDOR_ID_NVIDIA &&
+	    IS_ENABLED(CONFIG_VFIO_PCI_NVLINK2)) {
+		ret = vfio_pci_nvdia_v100_nvlink2_init(vdev);
+		if (ret && ret != -ENODEV) {
+			pci_warn(pdev, "Failed to setup NVIDIA NV2 RAM region\n");
+			goto disable_exit;
+		}
+	}
+
+	if (pdev->vendor == PCI_VENDOR_ID_IBM &&
+	    IS_ENABLED(CONFIG_VFIO_PCI_NVLINK2)) {
+		ret = vfio_pci_ibm_npu2_init(vdev);
+		if (ret && ret != -ENODEV) {
+			pci_warn(pdev, "Failed to setup NVIDIA NV2 ATSD region\n");
+			goto disable_exit;
+		}
+	}
+
+	vfio_pci_probe_mmaps(vdev);
+
+	return 0;
+
+disable_exit:
+	vfio_pci_disable(vdev);
+	return ret;
+}
+
+void vfio_pci_disable(struct vfio_pci_device *vdev)
+{
+	struct pci_dev *pdev = vdev->pdev;
+	struct vfio_pci_dummy_resource *dummy_res, *tmp;
+	struct vfio_pci_ioeventfd *ioeventfd, *ioeventfd_tmp;
+	int i, bar;
+
+	/* Stop the device from further DMA */
+	pci_clear_master(pdev);
+
+	vfio_pci_set_irqs_ioctl(vdev, VFIO_IRQ_SET_DATA_NONE |
+				VFIO_IRQ_SET_ACTION_TRIGGER,
+				vdev->irq_type, 0, 0, NULL);
+
+	/* Device closed, don't need mutex here */
+	list_for_each_entry_safe(ioeventfd, ioeventfd_tmp,
+				 &vdev->ioeventfds_list, next) {
+		vfio_virqfd_disable(&ioeventfd->virqfd);
+		list_del(&ioeventfd->next);
+		kfree(ioeventfd);
+	}
+	vdev->ioeventfds_nr = 0;
+
+	vdev->virq_disabled = false;
+
+	for (i = 0; i < vdev->num_regions; i++)
+		vdev->region[i].ops->release(vdev, &vdev->region[i]);
+
+	vdev->num_regions = 0;
+	kfree(vdev->region);
+	vdev->region = NULL; /* don't krealloc a freed pointer */
+
+	vfio_config_free(vdev);
+
+	for (bar = PCI_STD_RESOURCES; bar <= PCI_STD_RESOURCE_END; bar++) {
+		if (!vdev->barmap[bar])
+			continue;
+		pci_iounmap(pdev, vdev->barmap[bar]);
+		pci_release_selected_regions(pdev, 1 << bar);
+		vdev->barmap[bar] = NULL;
+	}
+
+	list_for_each_entry_safe(dummy_res, tmp,
+				 &vdev->dummy_resources_list, res_next) {
+		list_del(&dummy_res->res_next);
+		release_resource(&dummy_res->resource);
+		kfree(dummy_res);
+	}
+
+	vdev->needs_reset = true;
+
+	/*
+	 * If we have saved state, restore it.  If we can reset the device,
+	 * even better.  Resetting with current state seems better than
+	 * nothing, but saving and restoring current state without reset
+	 * is just busy work.
+	 */
+	if (pci_load_and_free_saved_state(pdev, &vdev->pci_saved_state)) {
+		pci_info(pdev, "%s: Couldn't reload saved state\n", __func__);
+
+		if (!vdev->reset_works)
+			goto out;
+
+		pci_save_state(pdev);
+	}
+
+	/*
+	 * Disable INTx and MSI, presumably to avoid spurious interrupts
+	 * during reset.  Stolen from pci_reset_function()
+	 */
+	pci_write_config_word(pdev, PCI_COMMAND, PCI_COMMAND_INTX_DISABLE);
+
+	/*
+	 * Try to get the locks ourselves to prevent a deadlock. The
+	 * success of this is dependent on being able to lock the device,
+	 * which is not always possible.
+	 * We can not use the "try" reset interface here, which will
+	 * overwrite the previously restored configuration information.
+	 */
+	if (vdev->reset_works && pci_cfg_access_trylock(pdev)) {
+		if (device_trylock(&pdev->dev)) {
+			if (!__pci_reset_function_locked(pdev))
+				vdev->needs_reset = false;
+			device_unlock(&pdev->dev);
+		}
+		pci_cfg_access_unlock(pdev);
+	}
+
+	pci_restore_state(pdev);
+out:
+	pci_disable_device(pdev);
+
+	vfio_pci_try_bus_reset(vdev);
+
+	if (!vdev->disable_idle_d3)
+		vfio_pci_set_power_state(vdev, PCI_D3hot);
+}
+
+void vfio_pci_refresh_config(struct vfio_pci_device *vdev,
+			bool nointxmask, bool disable_idle_d3)
+{
+	vdev->nointxmask = nointxmask;
+	vdev->disable_idle_d3 = disable_idle_d3;
+}
+
+static void vfio_pci_release(void *device_data)
+{
+	struct vfio_pci_device *vdev = device_data;
+
+	mutex_lock(&vdev->reflck->lock);
+
+	if (!(--vdev->refcnt)) {
+		vfio_spapr_pci_eeh_release(vdev->pdev);
+		vfio_pci_disable(vdev);
+	}
+
+	mutex_unlock(&vdev->reflck->lock);
+
+	module_put(THIS_MODULE);
+}
+
+static int vfio_pci_open(void *device_data)
+{
+	struct vfio_pci_device *vdev = device_data;
+	int ret = 0;
+
+	if (!try_module_get(THIS_MODULE))
+		return -ENODEV;
+
+	vfio_pci_refresh_config(vdev, nointxmask, disable_idle_d3);
+
+	mutex_lock(&vdev->reflck->lock);
+
+	if (!vdev->refcnt) {
+		ret = vfio_pci_enable(vdev);
+		if (ret)
+			goto error;
+
+		vfio_spapr_pci_eeh_open(vdev->pdev);
+	}
+	vdev->refcnt++;
+error:
+	mutex_unlock(&vdev->reflck->lock);
+	if (ret)
+		module_put(THIS_MODULE);
+	return ret;
+}
+
+static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type)
+{
+	if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) {
+		u8 pin;
+
+		if (!IS_ENABLED(CONFIG_VFIO_PCI_INTX) ||
+		    vdev->nointx || vdev->pdev->is_virtfn)
+			return 0;
+
+		pci_read_config_byte(vdev->pdev, PCI_INTERRUPT_PIN, &pin);
+
+		return pin ? 1 : 0;
+	} else if (irq_type == VFIO_PCI_MSI_IRQ_INDEX) {
+		u8 pos;
+		u16 flags;
+
+		pos = vdev->pdev->msi_cap;
+		if (pos) {
+			pci_read_config_word(vdev->pdev,
+					     pos + PCI_MSI_FLAGS, &flags);
+			return 1 << ((flags & PCI_MSI_FLAGS_QMASK) >> 1);
+		}
+	} else if (irq_type == VFIO_PCI_MSIX_IRQ_INDEX) {
+		u8 pos;
+		u16 flags;
+
+		pos = vdev->pdev->msix_cap;
+		if (pos) {
+			pci_read_config_word(vdev->pdev,
+					     pos + PCI_MSIX_FLAGS, &flags);
+
+			return (flags & PCI_MSIX_FLAGS_QSIZE) + 1;
+		}
+	} else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) {
+		if (pci_is_pcie(vdev->pdev))
+			return 1;
+	} else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
+		return 1;
+	}
+
+	return 0;
+}
+
+static int vfio_pci_count_devs(struct pci_dev *pdev, void *data)
+{
+	(*(int *)data)++;
+	return 0;
+}
+
+struct vfio_pci_fill_info {
+	int max;
+	int cur;
+	struct vfio_pci_dependent_device *devices;
+};
+
+static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
+{
+	struct vfio_pci_fill_info *fill = data;
+	struct iommu_group *iommu_group;
+
+	if (fill->cur == fill->max)
+		return -EAGAIN; /* Something changed, try again */
+
+	iommu_group = iommu_group_get(&pdev->dev);
+	if (!iommu_group)
+		return -EPERM; /* Cannot reset non-isolated devices */
+
+	fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
+	fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
+	fill->devices[fill->cur].bus = pdev->bus->number;
+	fill->devices[fill->cur].devfn = pdev->devfn;
+	fill->cur++;
+	iommu_group_put(iommu_group);
+	return 0;
+}
+
+struct vfio_pci_group_entry {
+	struct vfio_group *group;
+	int id;
+};
+
+struct vfio_pci_group_info {
+	int count;
+	struct vfio_pci_group_entry *groups;
+};
+
+static int vfio_pci_validate_devs(struct pci_dev *pdev, void *data)
+{
+	struct vfio_pci_group_info *info = data;
+	struct iommu_group *group;
+	int id, i;
+
+	group = iommu_group_get(&pdev->dev);
+	if (!group)
+		return -EPERM;
+
+	id = iommu_group_id(group);
+
+	for (i = 0; i < info->count; i++)
+		if (info->groups[i].id == id)
+			break;
+
+	iommu_group_put(group);
+
+	return (i == info->count) ? -EINVAL : 0;
+}
+
+static bool vfio_pci_dev_below_slot(struct pci_dev *pdev, struct pci_slot *slot)
+{
+	for (; pdev; pdev = pdev->bus->self)
+		if (pdev->bus == slot->bus)
+			return (pdev->slot == slot);
+	return false;
+}
+
+struct vfio_pci_walk_info {
+	int (*fn)(struct pci_dev *, void *data);
+	void *data;
+	struct pci_dev *pdev;
+	bool slot;
+	int ret;
+};
+
+static int vfio_pci_walk_wrapper(struct pci_dev *pdev, void *data)
+{
+	struct vfio_pci_walk_info *walk = data;
+
+	if (!walk->slot || vfio_pci_dev_below_slot(pdev, walk->pdev->slot))
+		walk->ret = walk->fn(pdev, walk->data);
+
+	return walk->ret;
+}
+
+static int vfio_pci_for_each_slot_or_bus(struct pci_dev *pdev,
+					 int (*fn)(struct pci_dev *,
+						   void *data), void *data,
+					 bool slot)
+{
+	struct vfio_pci_walk_info walk = {
+		.fn = fn, .data = data, .pdev = pdev, .slot = slot, .ret = 0,
+	};
+
+	pci_walk_bus(pdev->bus, vfio_pci_walk_wrapper, &walk);
+
+	return walk.ret;
+}
+
+static int msix_mmappable_cap(struct vfio_pci_device *vdev,
+			      struct vfio_info_cap *caps)
+{
+	struct vfio_info_cap_header header = {
+		.id = VFIO_REGION_INFO_CAP_MSIX_MAPPABLE,
+		.version = 1
+	};
+
+	return vfio_info_add_capability(caps, &header, sizeof(header));
+}
+
+int vfio_pci_register_dev_region(struct vfio_pci_device *vdev,
+				 unsigned int type, unsigned int subtype,
+				 const struct vfio_pci_regops *ops,
+				 size_t size, u32 flags, void *data)
+{
+	struct vfio_pci_region *region;
+
+	region = krealloc(vdev->region,
+			  (vdev->num_regions + 1) * sizeof(*region),
+			  GFP_KERNEL);
+	if (!region)
+		return -ENOMEM;
+
+	vdev->region = region;
+	vdev->region[vdev->num_regions].type = type;
+	vdev->region[vdev->num_regions].subtype = subtype;
+	vdev->region[vdev->num_regions].ops = ops;
+	vdev->region[vdev->num_regions].size = size;
+	vdev->region[vdev->num_regions].flags = flags;
+	vdev->region[vdev->num_regions].data = data;
+
+	vdev->num_regions++;
+
+	return 0;
+}
+
+long vfio_pci_ioctl(void *device_data,
+		   unsigned int cmd, unsigned long arg)
+{
+	struct vfio_pci_device *vdev = device_data;
+	unsigned long minsz;
+
+	if (cmd == VFIO_DEVICE_GET_INFO) {
+		struct vfio_device_info info;
+
+		minsz = offsetofend(struct vfio_device_info, num_irqs);
+
+		if (copy_from_user(&info, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (info.argsz < minsz)
+			return -EINVAL;
+
+		info.flags = VFIO_DEVICE_FLAGS_PCI;
+
+		if (vdev->reset_works)
+			info.flags |= VFIO_DEVICE_FLAGS_RESET;
+
+		info.num_regions = VFIO_PCI_NUM_REGIONS + vdev->num_regions;
+		info.num_irqs = VFIO_PCI_NUM_IRQS;
+
+		return copy_to_user((void __user *)arg, &info, minsz) ?
+			-EFAULT : 0;
+
+	} else if (cmd == VFIO_DEVICE_GET_REGION_INFO) {
+		struct pci_dev *pdev = vdev->pdev;
+		struct vfio_region_info info;
+		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
+		int i, ret;
+
+		minsz = offsetofend(struct vfio_region_info, offset);
+
+		if (copy_from_user(&info, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (info.argsz < minsz)
+			return -EINVAL;
+
+		switch (info.index) {
+		case VFIO_PCI_CONFIG_REGION_INDEX:
+			info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
+			info.size = pdev->cfg_size;
+			info.flags = VFIO_REGION_INFO_FLAG_READ |
+				     VFIO_REGION_INFO_FLAG_WRITE;
+			break;
+		case VFIO_PCI_BAR0_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX:
+			info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
+			info.size = pci_resource_len(pdev, info.index);
+			if (!info.size) {
+				info.flags = 0;
+				break;
+			}
+
+			info.flags = VFIO_REGION_INFO_FLAG_READ |
+				     VFIO_REGION_INFO_FLAG_WRITE;
+			if (vdev->bar_mmap_supported[info.index]) {
+				info.flags |= VFIO_REGION_INFO_FLAG_MMAP;
+				if (info.index == vdev->msix_bar) {
+					ret = msix_mmappable_cap(vdev, &caps);
+					if (ret)
+						return ret;
+				}
+			}
+
+			break;
+		case VFIO_PCI_ROM_REGION_INDEX:
+		{
+			void __iomem *io;
+			size_t size;
+			u16 orig_cmd;
+
+			info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
+			info.flags = 0;
+
+			/* Report the BAR size, not the ROM size */
+			info.size = pci_resource_len(pdev, info.index);
+			if (!info.size) {
+				/* Shadow ROMs appear as PCI option ROMs */
+				if (pdev->resource[PCI_ROM_RESOURCE].flags &
+							IORESOURCE_ROM_SHADOW)
+					info.size = 0x20000;
+				else
+					break;
+			}
+
+			/*
+			 * Is it really there?  Enable memory decode for
+			 * implicit access in pci_map_rom().
+			 */
+			pci_read_config_word(pdev, PCI_COMMAND, &orig_cmd);
+			pci_write_config_word(pdev, PCI_COMMAND,
+					      orig_cmd | PCI_COMMAND_MEMORY);
+
+			io = pci_map_rom(pdev, &size);
+			if (io) {
+				info.flags = VFIO_REGION_INFO_FLAG_READ;
+				pci_unmap_rom(pdev, io);
+			} else {
+				info.size = 0;
+			}
+
+			pci_write_config_word(pdev, PCI_COMMAND, orig_cmd);
+			break;
+		}
+		case VFIO_PCI_VGA_REGION_INDEX:
+			if (!vdev->has_vga)
+				return -EINVAL;
+
+			info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
+			info.size = 0xc0000;
+			info.flags = VFIO_REGION_INFO_FLAG_READ |
+				     VFIO_REGION_INFO_FLAG_WRITE;
+
+			break;
+		default:
+		{
+			struct vfio_region_info_cap_type cap_type = {
+					.header.id = VFIO_REGION_INFO_CAP_TYPE,
+					.header.version = 1 };
+
+			if (info.index >=
+			    VFIO_PCI_NUM_REGIONS + vdev->num_regions)
+				return -EINVAL;
+			info.index = array_index_nospec(info.index,
+							VFIO_PCI_NUM_REGIONS +
+							vdev->num_regions);
+
+			i = info.index - VFIO_PCI_NUM_REGIONS;
+
+			info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
+			info.size = vdev->region[i].size;
+			info.flags = vdev->region[i].flags;
+
+			cap_type.type = vdev->region[i].type;
+			cap_type.subtype = vdev->region[i].subtype;
+
+			ret = vfio_info_add_capability(&caps, &cap_type.header,
+						       sizeof(cap_type));
+			if (ret)
+				return ret;
+
+			if (vdev->region[i].ops->add_capability) {
+				ret = vdev->region[i].ops->add_capability(vdev,
+						&vdev->region[i], &caps);
+				if (ret)
+					return ret;
+			}
+		}
+		}
+
+		if (caps.size) {
+			info.flags |= VFIO_REGION_INFO_FLAG_CAPS;
+			if (info.argsz < sizeof(info) + caps.size) {
+				info.argsz = sizeof(info) + caps.size;
+				info.cap_offset = 0;
+			} else {
+				vfio_info_cap_shift(&caps, sizeof(info));
+				if (copy_to_user((void __user *)arg +
+						  sizeof(info), caps.buf,
+						  caps.size)) {
+					kfree(caps.buf);
+					return -EFAULT;
+				}
+				info.cap_offset = sizeof(info);
+			}
+
+			kfree(caps.buf);
+		}
+
+		return copy_to_user((void __user *)arg, &info, minsz) ?
+			-EFAULT : 0;
+
+	} else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) {
+		struct vfio_irq_info info;
+
+		minsz = offsetofend(struct vfio_irq_info, count);
+
+		if (copy_from_user(&info, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (info.argsz < minsz || info.index >= VFIO_PCI_NUM_IRQS)
+			return -EINVAL;
+
+		switch (info.index) {
+		case VFIO_PCI_INTX_IRQ_INDEX ... VFIO_PCI_MSIX_IRQ_INDEX:
+		case VFIO_PCI_REQ_IRQ_INDEX:
+			break;
+		case VFIO_PCI_ERR_IRQ_INDEX:
+			if (pci_is_pcie(vdev->pdev))
+				break;
+		/* fall through */
+		default:
+			return -EINVAL;
+		}
+
+		info.flags = VFIO_IRQ_INFO_EVENTFD;
+
+		info.count = vfio_pci_get_irq_count(vdev, info.index);
+
+		if (info.index == VFIO_PCI_INTX_IRQ_INDEX)
+			info.flags |= (VFIO_IRQ_INFO_MASKABLE |
+				       VFIO_IRQ_INFO_AUTOMASKED);
+		else
+			info.flags |= VFIO_IRQ_INFO_NORESIZE;
+
+		return copy_to_user((void __user *)arg, &info, minsz) ?
+			-EFAULT : 0;
+
+	} else if (cmd == VFIO_DEVICE_SET_IRQS) {
+		struct vfio_irq_set hdr;
+		u8 *data = NULL;
+		int max, ret = 0;
+		size_t data_size = 0;
+
+		minsz = offsetofend(struct vfio_irq_set, count);
+
+		if (copy_from_user(&hdr, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		max = vfio_pci_get_irq_count(vdev, hdr.index);
+
+		ret = vfio_set_irqs_validate_and_prepare(&hdr, max,
+						 VFIO_PCI_NUM_IRQS, &data_size);
+		if (ret)
+			return ret;
+
+		if (data_size) {
+			data = memdup_user((void __user *)(arg + minsz),
+					    data_size);
+			if (IS_ERR(data))
+				return PTR_ERR(data);
+		}
+
+		mutex_lock(&vdev->igate);
+
+		ret = vfio_pci_set_irqs_ioctl(vdev, hdr.flags, hdr.index,
+					      hdr.start, hdr.count, data);
+
+		mutex_unlock(&vdev->igate);
+		kfree(data);
+
+		return ret;
+
+	} else if (cmd == VFIO_DEVICE_RESET) {
+		return vdev->reset_works ?
+			pci_try_reset_function(vdev->pdev) : -EINVAL;
+
+	} else if (cmd == VFIO_DEVICE_GET_PCI_HOT_RESET_INFO) {
+		struct vfio_pci_hot_reset_info hdr;
+		struct vfio_pci_fill_info fill = { 0 };
+		struct vfio_pci_dependent_device *devices = NULL;
+		bool slot = false;
+		int ret = 0;
+
+		minsz = offsetofend(struct vfio_pci_hot_reset_info, count);
+
+		if (copy_from_user(&hdr, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (hdr.argsz < minsz)
+			return -EINVAL;
+
+		hdr.flags = 0;
+
+		/* Can we do a slot or bus reset or neither? */
+		if (!pci_probe_reset_slot(vdev->pdev->slot))
+			slot = true;
+		else if (pci_probe_reset_bus(vdev->pdev->bus))
+			return -ENODEV;
+
+		/* How many devices are affected? */
+		ret = vfio_pci_for_each_slot_or_bus(vdev->pdev,
+						    vfio_pci_count_devs,
+						    &fill.max, slot);
+		if (ret)
+			return ret;
+
+		WARN_ON(!fill.max); /* Should always be at least one */
+
+		/*
+		 * If there's enough space, fill it now, otherwise return
+		 * -ENOSPC and the number of devices affected.
+		 */
+		if (hdr.argsz < sizeof(hdr) + (fill.max * sizeof(*devices))) {
+			ret = -ENOSPC;
+			hdr.count = fill.max;
+			goto reset_info_exit;
+		}
+
+		devices = kcalloc(fill.max, sizeof(*devices), GFP_KERNEL);
+		if (!devices)
+			return -ENOMEM;
+
+		fill.devices = devices;
+
+		ret = vfio_pci_for_each_slot_or_bus(vdev->pdev,
+						    vfio_pci_fill_devs,
+						    &fill, slot);
+
+		/*
+		 * If a device was removed between counting and filling,
+		 * we may come up short of fill.max.  If a device was
+		 * added, we'll have a return of -EAGAIN above.
+		 */
+		if (!ret)
+			hdr.count = fill.cur;
+
+reset_info_exit:
+		if (copy_to_user((void __user *)arg, &hdr, minsz))
+			ret = -EFAULT;
+
+		if (!ret) {
+			if (copy_to_user((void __user *)(arg + minsz), devices,
+					 hdr.count * sizeof(*devices)))
+				ret = -EFAULT;
+		}
+
+		kfree(devices);
+		return ret;
+
+	} else if (cmd == VFIO_DEVICE_PCI_HOT_RESET) {
+		struct vfio_pci_hot_reset hdr;
+		int32_t *group_fds;
+		struct vfio_pci_group_entry *groups;
+		struct vfio_pci_group_info info;
+		bool slot = false;
+		int i, count = 0, ret = 0;
+
+		minsz = offsetofend(struct vfio_pci_hot_reset, count);
+
+		if (copy_from_user(&hdr, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (hdr.argsz < minsz || hdr.flags)
+			return -EINVAL;
+
+		/* Can we do a slot or bus reset or neither? */
+		if (!pci_probe_reset_slot(vdev->pdev->slot))
+			slot = true;
+		else if (pci_probe_reset_bus(vdev->pdev->bus))
+			return -ENODEV;
+
+		/*
+		 * We can't let userspace give us an arbitrarily large
+		 * buffer to copy, so verify how many we think there
+		 * could be.  Note groups can have multiple devices so
+		 * one group per device is the max.
+		 */
+		ret = vfio_pci_for_each_slot_or_bus(vdev->pdev,
+						    vfio_pci_count_devs,
+						    &count, slot);
+		if (ret)
+			return ret;
+
+		/* Somewhere between 1 and count is OK */
+		if (!hdr.count || hdr.count > count)
+			return -EINVAL;
+
+		group_fds = kcalloc(hdr.count, sizeof(*group_fds), GFP_KERNEL);
+		groups = kcalloc(hdr.count, sizeof(*groups), GFP_KERNEL);
+		if (!group_fds || !groups) {
+			kfree(group_fds);
+			kfree(groups);
+			return -ENOMEM;
+		}
+
+		if (copy_from_user(group_fds, (void __user *)(arg + minsz),
+				   hdr.count * sizeof(*group_fds))) {
+			kfree(group_fds);
+			kfree(groups);
+			return -EFAULT;
+		}
+
+		/*
+		 * For each group_fd, get the group through the vfio external
+		 * user interface and store the group and iommu ID.  This
+		 * ensures the group is held across the reset.
+		 */
+		for (i = 0; i < hdr.count; i++) {
+			struct vfio_group *group;
+			struct fd f = fdget(group_fds[i]);
+			if (!f.file) {
+				ret = -EBADF;
+				break;
+			}
+
+			group = vfio_group_get_external_user(f.file);
+			fdput(f);
+			if (IS_ERR(group)) {
+				ret = PTR_ERR(group);
+				break;
+			}
+
+			groups[i].group = group;
+			groups[i].id = vfio_external_user_iommu_id(group);
+		}
+
+		kfree(group_fds);
+
+		/* release reference to groups on error */
+		if (ret)
+			goto hot_reset_release;
+
+		info.count = hdr.count;
+		info.groups = groups;
+
+		/*
+		 * Test whether all the affected devices are contained
+		 * by the set of groups provided by the user.
+		 */
+		ret = vfio_pci_for_each_slot_or_bus(vdev->pdev,
+						    vfio_pci_validate_devs,
+						    &info, slot);
+		if (!ret)
+			/* User has access, do the reset */
+			ret = pci_reset_bus(vdev->pdev);
+
+hot_reset_release:
+		for (i--; i >= 0; i--)
+			vfio_group_put_external_user(groups[i].group);
+
+		kfree(groups);
+		return ret;
+	} else if (cmd == VFIO_DEVICE_IOEVENTFD) {
+		struct vfio_device_ioeventfd ioeventfd;
+		int count;
+
+		minsz = offsetofend(struct vfio_device_ioeventfd, fd);
+
+		if (copy_from_user(&ioeventfd, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (ioeventfd.argsz < minsz)
+			return -EINVAL;
+
+		if (ioeventfd.flags & ~VFIO_DEVICE_IOEVENTFD_SIZE_MASK)
+			return -EINVAL;
+
+		count = ioeventfd.flags & VFIO_DEVICE_IOEVENTFD_SIZE_MASK;
+
+		if (hweight8(count) != 1 || ioeventfd.fd < -1)
+			return -EINVAL;
+
+		return vfio_pci_ioeventfd(vdev, ioeventfd.offset,
+					  ioeventfd.data, count, ioeventfd.fd);
+	}
+
+	return -ENOTTY;
+}
+
+static ssize_t vfio_pci_rw(void *device_data, char __user *buf,
+			   size_t count, loff_t *ppos, bool iswrite)
+{
+	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
+	struct vfio_pci_device *vdev = device_data;
+
+	if (index >= VFIO_PCI_NUM_REGIONS + vdev->num_regions)
+		return -EINVAL;
+
+	switch (index) {
+	case VFIO_PCI_CONFIG_REGION_INDEX:
+		return vfio_pci_config_rw(vdev, buf, count, ppos, iswrite);
+
+	case VFIO_PCI_ROM_REGION_INDEX:
+		if (iswrite)
+			return -EINVAL;
+		return vfio_pci_bar_rw(vdev, buf, count, ppos, false);
+
+	case VFIO_PCI_BAR0_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX:
+		return vfio_pci_bar_rw(vdev, buf, count, ppos, iswrite);
+
+	case VFIO_PCI_VGA_REGION_INDEX:
+		return vfio_pci_vga_rw(vdev, buf, count, ppos, iswrite);
+	default:
+		index -= VFIO_PCI_NUM_REGIONS;
+		return vdev->region[index].ops->rw(vdev, buf,
+						   count, ppos, iswrite);
+	}
+
+	return -EINVAL;
+}
+
+ssize_t vfio_pci_read(void *device_data, char __user *buf,
+			     size_t count, loff_t *ppos)
+{
+	if (!count)
+		return 0;
+
+	return vfio_pci_rw(device_data, buf, count, ppos, false);
+}
+
+ssize_t vfio_pci_write(void *device_data, const char __user *buf,
+			      size_t count, loff_t *ppos)
+{
+	if (!count)
+		return 0;
+
+	return vfio_pci_rw(device_data, (char __user *)buf, count, ppos, true);
+}
+
+int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
+{
+	struct vfio_pci_device *vdev = device_data;
+	struct pci_dev *pdev = vdev->pdev;
+	unsigned int index;
+	u64 phys_len, req_len, pgoff, req_start;
+	int ret;
+
+	index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
+
+	if (vma->vm_end < vma->vm_start)
+		return -EINVAL;
+	if ((vma->vm_flags & VM_SHARED) == 0)
+		return -EINVAL;
+	if (index >= VFIO_PCI_NUM_REGIONS) {
+		int regnum = index - VFIO_PCI_NUM_REGIONS;
+		struct vfio_pci_region *region = vdev->region + regnum;
+
+		if (region && region->ops && region->ops->mmap &&
+		    (region->flags & VFIO_REGION_INFO_FLAG_MMAP))
+			return region->ops->mmap(vdev, region, vma);
+		return -EINVAL;
+	}
+	if (index >= VFIO_PCI_ROM_REGION_INDEX)
+		return -EINVAL;
+	if (!vdev->bar_mmap_supported[index])
+		return -EINVAL;
+
+	phys_len = PAGE_ALIGN(pci_resource_len(pdev, index));
+	req_len = vma->vm_end - vma->vm_start;
+	pgoff = vma->vm_pgoff &
+		((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1);
+	req_start = pgoff << PAGE_SHIFT;
+
+	if (req_start + req_len > phys_len)
+		return -EINVAL;
+
+	/*
+	 * Even though we don't make use of the barmap for the mmap,
+	 * we need to request the region and the barmap tracks that.
+	 */
+	if (!vdev->barmap[index]) {
+		ret = pci_request_selected_regions(pdev,
+						   1 << index, "vfio-pci");
+		if (ret)
+			return ret;
+
+		vdev->barmap[index] = pci_iomap(pdev, index, 0);
+		if (!vdev->barmap[index]) {
+			pci_release_selected_regions(pdev, 1 << index);
+			return -ENOMEM;
+		}
+	}
+
+	vma->vm_private_data = vdev;
+	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+	vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff;
+
+	return remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
+			       req_len, vma->vm_page_prot);
+}
+
+void vfio_pci_request(void *device_data, unsigned int count)
+{
+	struct vfio_pci_device *vdev = device_data;
+	struct pci_dev *pdev = vdev->pdev;
+
+	mutex_lock(&vdev->igate);
+
+	if (vdev->req_trigger) {
+		if (!(count % 10))
+			pci_notice_ratelimited(pdev,
+				"Relaying device request to user (#%u)\n",
+				count);
+		eventfd_signal(vdev->req_trigger, 1);
+	} else if (count == 0) {
+		pci_warn(pdev,
+			"No device request channel registered, blocked until released by user\n");
+	}
+
+	mutex_unlock(&vdev->igate);
+}
+
+static const struct vfio_device_ops vfio_pci_ops = {
+	.name		= "vfio-pci",
+	.open		= vfio_pci_open,
+	.release	= vfio_pci_release,
+	.ioctl		= vfio_pci_ioctl,
+	.read		= vfio_pci_read,
+	.write		= vfio_pci_write,
+	.mmap		= vfio_pci_mmap,
+	.request	= vfio_pci_request,
+};
+
+static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+	struct vfio_pci_device *vdev;
+	struct iommu_group *group;
+	int ret;
+
+	if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
+		return -EINVAL;
+
+	/*
+	 * Prevent binding to PFs with VFs enabled, this too easily allows
+	 * userspace instance with VFs and PFs from the same device, which
+	 * cannot work.  Disabling SR-IOV here would initiate removing the
+	 * VFs, which would unbind the driver, which is prone to blocking
+	 * if that VF is also in use by vfio-pci.  Just reject these PFs
+	 * and let the user sort it out.
+	 */
+	if (pci_num_vf(pdev)) {
+		pci_warn(pdev, "Cannot bind to PF with SR-IOV enabled\n");
+		return -EBUSY;
+	}
+
+	group = vfio_iommu_group_get(&pdev->dev);
+	if (!group)
+		return -EINVAL;
+
+	vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
+	if (!vdev) {
+		vfio_iommu_group_put(group, &pdev->dev);
+		return -ENOMEM;
+	}
+
+	vdev->pdev = pdev;
+	vdev->irq_type = VFIO_PCI_NUM_IRQS;
+	mutex_init(&vdev->igate);
+	spin_lock_init(&vdev->irqlock);
+	mutex_init(&vdev->ioeventfds_lock);
+	INIT_LIST_HEAD(&vdev->ioeventfds_list);
+	vdev->nointxmask = nointxmask;
+#ifdef CONFIG_VFIO_PCI_VGA
+	vdev->disable_vga = disable_vga;
+#endif
+	vdev->disable_idle_d3 = disable_idle_d3;
+
+	ret = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, vdev);
+	if (ret) {
+		vfio_iommu_group_put(group, &pdev->dev);
+		kfree(vdev);
+		return ret;
+	}
+
+	ret = vfio_pci_reflck_attach(vdev);
+	if (ret) {
+		vfio_del_group_dev(&pdev->dev);
+		vfio_iommu_group_put(group, &pdev->dev);
+		kfree(vdev);
+		return ret;
+	}
+
+	if (vfio_pci_is_vga(pdev)) {
+		vga_client_register(pdev, vdev, NULL, vfio_pci_set_vga_decode);
+		vga_set_legacy_decoding(pdev,
+					vfio_pci_set_vga_decode(vdev, false));
+	}
+
+	vfio_pci_probe_power_state(vdev);
+
+	if (!vdev->disable_idle_d3) {
+		/*
+		 * pci-core sets the device power state to an unknown value at
+		 * bootup and after being removed from a driver.  The only
+		 * transition it allows from this unknown state is to D0, which
+		 * typically happens when a driver calls pci_enable_device().
+		 * We're not ready to enable the device yet, but we do want to
+		 * be able to get to D3.  Therefore first do a D0 transition
+		 * before going to D3.
+		 */
+		vfio_pci_set_power_state(vdev, PCI_D0);
+		vfio_pci_set_power_state(vdev, PCI_D3hot);
+	}
+
+	return ret;
+}
+
+static void vfio_pci_remove(struct pci_dev *pdev)
+{
+	struct vfio_pci_device *vdev;
+
+	vdev = vfio_del_group_dev(&pdev->dev);
+	if (!vdev)
+		return;
+
+	vfio_pci_reflck_put(vdev->reflck);
+
+	vfio_iommu_group_put(pdev->dev.iommu_group, &pdev->dev);
+	kfree(vdev->region);
+	mutex_destroy(&vdev->ioeventfds_lock);
+
+	if (!vdev->disable_idle_d3)
+		vfio_pci_set_power_state(vdev, PCI_D0);
+
+	kfree(vdev->pm_save);
+	kfree(vdev);
+
+	if (vfio_pci_is_vga(pdev)) {
+		vga_client_register(pdev, NULL, NULL, NULL);
+		vga_set_legacy_decoding(pdev,
+				VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM |
+				VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM);
+	}
+}
+
+static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
+						  pci_channel_state_t state)
+{
+	struct vfio_pci_device *vdev;
+	struct vfio_device *device;
+
+	device = vfio_device_get_from_dev(&pdev->dev);
+	if (device == NULL)
+		return PCI_ERS_RESULT_DISCONNECT;
+
+	vdev = vfio_device_data(device);
+	if (vdev == NULL) {
+		vfio_device_put(device);
+		return PCI_ERS_RESULT_DISCONNECT;
+	}
+
+	mutex_lock(&vdev->igate);
+
+	if (vdev->err_trigger)
+		eventfd_signal(vdev->err_trigger, 1);
+
+	mutex_unlock(&vdev->igate);
+
+	vfio_device_put(device);
+
+	return PCI_ERS_RESULT_CAN_RECOVER;
+}
+
+static const struct pci_error_handlers vfio_err_handlers = {
+	.error_detected = vfio_pci_aer_err_detected,
+};
+
+static struct pci_driver vfio_pci_driver = {
+	.name		= "vfio-pci",
+	.id_table	= NULL, /* only dynamic ids */
+	.probe		= vfio_pci_probe,
+	.remove		= vfio_pci_remove,
+	.err_handler	= &vfio_err_handlers,
+};
+
+static DEFINE_MUTEX(reflck_lock);
+
+static struct vfio_pci_reflck *vfio_pci_reflck_alloc(void)
+{
+	struct vfio_pci_reflck *reflck;
+
+	reflck = kzalloc(sizeof(*reflck), GFP_KERNEL);
+	if (!reflck)
+		return ERR_PTR(-ENOMEM);
+
+	kref_init(&reflck->kref);
+	mutex_init(&reflck->lock);
+
+	return reflck;
+}
+
+static void vfio_pci_reflck_get(struct vfio_pci_reflck *reflck)
+{
+	kref_get(&reflck->kref);
+}
+
+static int vfio_pci_reflck_find(struct pci_dev *pdev, void *data)
+{
+	struct vfio_pci_device *vdev = data;
+	struct vfio_pci_reflck **preflck = &vdev->reflck;
+	struct vfio_device *device;
+	struct vfio_pci_device *tmp;
+
+	device = vfio_device_get_from_dev(&pdev->dev);
+	if (!device)
+		return 0;
+
+	if (pci_dev_driver(pdev) != pci_dev_driver(vdev->pdev)) {
+		vfio_device_put(device);
+		return 0;
+	}
+
+	tmp = vfio_device_data(device);
+
+	if (tmp->reflck) {
+		vfio_pci_reflck_get(tmp->reflck);
+		*preflck = tmp->reflck;
+		vfio_device_put(device);
+		return 1;
+	}
+
+	vfio_device_put(device);
+	return 0;
+}
+
+int vfio_pci_reflck_attach(struct vfio_pci_device *vdev)
+{
+	bool slot = !pci_probe_reset_slot(vdev->pdev->slot);
+
+	mutex_lock(&reflck_lock);
+
+	if (pci_is_root_bus(vdev->pdev->bus) ||
+	    vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_reflck_find,
+					  vdev, slot) <= 0)
+		vdev->reflck = vfio_pci_reflck_alloc();
+
+	mutex_unlock(&reflck_lock);
+
+	return PTR_ERR_OR_ZERO(vdev->reflck);
+}
+
+static void vfio_pci_reflck_release(struct kref *kref)
+{
+	struct vfio_pci_reflck *reflck = container_of(kref,
+						      struct vfio_pci_reflck,
+						      kref);
+
+	kfree(reflck);
+	mutex_unlock(&reflck_lock);
+}
+
+void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck)
+{
+	kref_put_mutex(&reflck->kref, vfio_pci_reflck_release, &reflck_lock);
+}
+
+struct vfio_devices {
+	struct vfio_device **devices;
+	struct vfio_pci_device *vdev;
+	int cur_index;
+	int max_index;
+};
+
+static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data)
+{
+	struct vfio_devices *devs = data;
+	struct vfio_device *device;
+	struct vfio_pci_device *tmp;
+
+	if (devs->cur_index == devs->max_index)
+		return -ENOSPC;
+
+	device = vfio_device_get_from_dev(&pdev->dev);
+	if (!device)
+		return -EINVAL;
+
+	if (pci_dev_driver(pdev) != pci_dev_driver(devs->vdev->pdev)) {
+		vfio_device_put(device);
+		return -EBUSY;
+	}
+
+	tmp = vfio_device_data(device);
+
+	/* Fault if the device is not unused */
+	if (tmp->refcnt) {
+		vfio_device_put(device);
+		return -EBUSY;
+	}
+
+	devs->devices[devs->cur_index++] = device;
+	return 0;
+}
+
+/*
+ * If a bus or slot reset is available for the provided device and:
+ *  - All of the devices affected by that bus or slot reset are unused
+ *    (!refcnt)
+ *  - At least one of the affected devices is marked dirty via
+ *    needs_reset (such as by lack of FLR support)
+ * Then attempt to perform that bus or slot reset.  Callers are required
+ * to hold vdev->reflck->lock, protecting the bus/slot reset group from
+ * concurrent opens.  A vfio_device reference is acquired for each device
+ * to prevent unbinds during the reset operation.
+ *
+ * NB: vfio-core considers a group to be viable even if some devices are
+ * bound to drivers like pci-stub or pcieport.  Here we require all devices
+ * to be bound to vfio_pci since that's the only way we can be sure they
+ * stay put.
+ */
+static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev)
+{
+	struct vfio_devices devs = { .cur_index = 0 };
+	int i = 0, ret = -EINVAL;
+	bool slot = false;
+	struct vfio_pci_device *tmp;
+
+	if (!pci_probe_reset_slot(vdev->pdev->slot))
+		slot = true;
+	else if (pci_probe_reset_bus(vdev->pdev->bus))
+		return;
+
+	if (vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_count_devs,
+					  &i, slot) || !i)
+		return;
+
+	devs.max_index = i;
+	devs.devices = kcalloc(i, sizeof(struct vfio_device *), GFP_KERNEL);
+	if (!devs.devices)
+		return;
+
+	devs.vdev = vdev;
+	if (vfio_pci_for_each_slot_or_bus(vdev->pdev,
+					  vfio_pci_get_unused_devs,
+					  &devs, slot))
+		goto put_devs;
+
+	/* Does at least one need a reset? */
+	for (i = 0; i < devs.cur_index; i++) {
+		tmp = vfio_device_data(devs.devices[i]);
+		if (tmp->needs_reset) {
+			ret = pci_reset_bus(vdev->pdev);
+			break;
+		}
+	}
+
+put_devs:
+	for (i = 0; i < devs.cur_index; i++) {
+		tmp = vfio_device_data(devs.devices[i]);
+
+		/*
+		 * If reset was successful, affected devices no longer need
+		 * a reset and we should return all the collateral devices
+		 * to low power.  If not successful, we either didn't reset
+		 * the bus or timed out waiting for it, so let's not touch
+		 * the power state.
+		 */
+		if (!ret) {
+			tmp->needs_reset = false;
+
+			if (tmp != vdev && !tmp->disable_idle_d3)
+				vfio_pci_set_power_state(tmp, PCI_D3hot);
+		}
+
+		vfio_device_put(devs.devices[i]);
+	}
+
+	kfree(devs.devices);
+}
+
+static void __exit vfio_pci_cleanup(void)
+{
+	pci_unregister_driver(&vfio_pci_driver);
+	vfio_pci_uninit_perm_bits();
+}
+
+void __init vfio_pci_fill_ids(char *ids, struct pci_driver *driver)
+{
+	char *p, *id;
+	int rc;
+
+	/* no ids passed actually */
+	if (ids[0] == '\0')
+		return;
+
+	/* add ids specified in the module parameter */
+	p = ids;
+	while ((id = strsep(&p, ","))) {
+		unsigned int vendor, device, subvendor = PCI_ANY_ID,
+			subdevice = PCI_ANY_ID, class = 0, class_mask = 0;
+		int fields;
+
+		if (!strlen(id))
+			continue;
+
+		fields = sscanf(id, "%x:%x:%x:%x:%x:%x",
+				&vendor, &device, &subvendor, &subdevice,
+				&class, &class_mask);
+
+		if (fields < 2) {
+			pr_warn("invalid id string \"%s\"\n", id);
+			continue;
+		}
+
+		rc = pci_add_dynid(driver, vendor, device,
+				   subvendor, subdevice, class, class_mask, 0);
+		if (rc)
+			pr_warn("failed to add dynamic id [%04x:%04x[%04x:%04x]] class %#08x/%08x (%d)\n",
+				vendor, device, subvendor, subdevice,
+				class, class_mask, rc);
+		else
+			pr_info("add [%04x:%04x[%04x:%04x]] class %#08x/%08x\n",
+				vendor, device, subvendor, subdevice,
+				class, class_mask);
+	}
+}
+
+static int __init vfio_pci_init(void)
+{
+	int ret;
+
+	/* Allocate shared config space permision data used by all devices */
+	ret = vfio_pci_init_perm_bits();
+	if (ret)
+		return ret;
+
+	/* Register and scan for devices */
+	ret = pci_register_driver(&vfio_pci_driver);
+	if (ret)
+		goto out_driver;
+
+	vfio_pci_fill_ids(ids, &vfio_pci_driver);
+
+	return 0;
+
+out_driver:
+	vfio_pci_uninit_perm_bits();
+	return ret;
+}
+
+module_init(vfio_pci_init);
+module_exit(vfio_pci_cleanup);
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 06/10] vfio_pci: shrink vfio_pci_common.c
  2019-11-21 11:23 [PATCH v3 00/10] vfio_pci: wrap pci device as a mediated device Liu Yi L
                   ` (4 preceding siblings ...)
  2019-11-21 11:23 ` [PATCH v3 05/10] vfio_pci: duplicate vfio_pci.c Liu Yi L
@ 2019-11-21 11:23 ` Liu Yi L
  2019-11-21 11:23 ` [PATCH v3 07/10] vfio_pci: shrink vfio_pci.c Liu Yi L
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Liu Yi L @ 2019-11-21 11:23 UTC (permalink / raw)
  To: alex.williamson, kwankhede
  Cc: kevin.tian, baolu.lu, yi.l.liu, yi.y.sun, joro,
	jean-philippe.brucker, peterx, linux-kernel, kvm

This patch removes the vfio-pci module specific codes in vfio_pci_common.c
to make vfio_pci_common.c be a common source file.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 drivers/vfio/pci/vfio_pci_common.c | 235 -------------------------------------
 1 file changed, 235 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_common.c b/drivers/vfio/pci/vfio_pci_common.c
index 80a6b15..4664733a 100644
--- a/drivers/vfio/pci/vfio_pci_common.c
+++ b/drivers/vfio/pci/vfio_pci_common.c
@@ -30,30 +30,6 @@
 
 #include "vfio_pci_private.h"
 
-#define DRIVER_VERSION  "0.2"
-#define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
-#define DRIVER_DESC     "VFIO PCI - User Level meta-driver"
-
-static char ids[1024] __initdata;
-module_param_string(ids, ids, sizeof(ids), 0);
-MODULE_PARM_DESC(ids, "Initial PCI IDs to add to the vfio driver, format is \"vendor:device[:subvendor[:subdevice[:class[:class_mask]]]]\" and multiple comma separated entries can be specified");
-
-static bool nointxmask;
-module_param_named(nointxmask, nointxmask, bool, S_IRUGO | S_IWUSR);
-MODULE_PARM_DESC(nointxmask,
-		  "Disable support for PCI 2.3 style INTx masking.  If this resolves problems for specific devices, report lspci -vvvxxx to linux-pci@vger.kernel.org so the device can be fixed automatically via the broken_intx_masking flag.");
-
-#ifdef CONFIG_VFIO_PCI_VGA
-static bool disable_vga;
-module_param(disable_vga, bool, S_IRUGO);
-MODULE_PARM_DESC(disable_vga, "Disable VGA resource access through vfio-pci");
-#endif
-
-static bool disable_idle_d3;
-module_param(disable_idle_d3, bool, S_IRUGO | S_IWUSR);
-MODULE_PARM_DESC(disable_idle_d3,
-		 "Disable using the PCI D3 low power state for idle, unused devices");
-
 /*
  * Our VGA arbiter participation is limited since we don't know anything
  * about the device itself.  However, if the device is the only VGA device
@@ -456,49 +432,6 @@ void vfio_pci_refresh_config(struct vfio_pci_device *vdev,
 	vdev->disable_idle_d3 = disable_idle_d3;
 }
 
-static void vfio_pci_release(void *device_data)
-{
-	struct vfio_pci_device *vdev = device_data;
-
-	mutex_lock(&vdev->reflck->lock);
-
-	if (!(--vdev->refcnt)) {
-		vfio_spapr_pci_eeh_release(vdev->pdev);
-		vfio_pci_disable(vdev);
-	}
-
-	mutex_unlock(&vdev->reflck->lock);
-
-	module_put(THIS_MODULE);
-}
-
-static int vfio_pci_open(void *device_data)
-{
-	struct vfio_pci_device *vdev = device_data;
-	int ret = 0;
-
-	if (!try_module_get(THIS_MODULE))
-		return -ENODEV;
-
-	vfio_pci_refresh_config(vdev, nointxmask, disable_idle_d3);
-
-	mutex_lock(&vdev->reflck->lock);
-
-	if (!vdev->refcnt) {
-		ret = vfio_pci_enable(vdev);
-		if (ret)
-			goto error;
-
-		vfio_spapr_pci_eeh_open(vdev->pdev);
-	}
-	vdev->refcnt++;
-error:
-	mutex_unlock(&vdev->reflck->lock);
-	if (ret)
-		module_put(THIS_MODULE);
-	return ret;
-}
-
 static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type)
 {
 	if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) {
@@ -1270,129 +1203,6 @@ void vfio_pci_request(void *device_data, unsigned int count)
 	mutex_unlock(&vdev->igate);
 }
 
-static const struct vfio_device_ops vfio_pci_ops = {
-	.name		= "vfio-pci",
-	.open		= vfio_pci_open,
-	.release	= vfio_pci_release,
-	.ioctl		= vfio_pci_ioctl,
-	.read		= vfio_pci_read,
-	.write		= vfio_pci_write,
-	.mmap		= vfio_pci_mmap,
-	.request	= vfio_pci_request,
-};
-
-static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
-{
-	struct vfio_pci_device *vdev;
-	struct iommu_group *group;
-	int ret;
-
-	if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
-		return -EINVAL;
-
-	/*
-	 * Prevent binding to PFs with VFs enabled, this too easily allows
-	 * userspace instance with VFs and PFs from the same device, which
-	 * cannot work.  Disabling SR-IOV here would initiate removing the
-	 * VFs, which would unbind the driver, which is prone to blocking
-	 * if that VF is also in use by vfio-pci.  Just reject these PFs
-	 * and let the user sort it out.
-	 */
-	if (pci_num_vf(pdev)) {
-		pci_warn(pdev, "Cannot bind to PF with SR-IOV enabled\n");
-		return -EBUSY;
-	}
-
-	group = vfio_iommu_group_get(&pdev->dev);
-	if (!group)
-		return -EINVAL;
-
-	vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
-	if (!vdev) {
-		vfio_iommu_group_put(group, &pdev->dev);
-		return -ENOMEM;
-	}
-
-	vdev->pdev = pdev;
-	vdev->irq_type = VFIO_PCI_NUM_IRQS;
-	mutex_init(&vdev->igate);
-	spin_lock_init(&vdev->irqlock);
-	mutex_init(&vdev->ioeventfds_lock);
-	INIT_LIST_HEAD(&vdev->ioeventfds_list);
-	vdev->nointxmask = nointxmask;
-#ifdef CONFIG_VFIO_PCI_VGA
-	vdev->disable_vga = disable_vga;
-#endif
-	vdev->disable_idle_d3 = disable_idle_d3;
-
-	ret = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, vdev);
-	if (ret) {
-		vfio_iommu_group_put(group, &pdev->dev);
-		kfree(vdev);
-		return ret;
-	}
-
-	ret = vfio_pci_reflck_attach(vdev);
-	if (ret) {
-		vfio_del_group_dev(&pdev->dev);
-		vfio_iommu_group_put(group, &pdev->dev);
-		kfree(vdev);
-		return ret;
-	}
-
-	if (vfio_pci_is_vga(pdev)) {
-		vga_client_register(pdev, vdev, NULL, vfio_pci_set_vga_decode);
-		vga_set_legacy_decoding(pdev,
-					vfio_pci_set_vga_decode(vdev, false));
-	}
-
-	vfio_pci_probe_power_state(vdev);
-
-	if (!vdev->disable_idle_d3) {
-		/*
-		 * pci-core sets the device power state to an unknown value at
-		 * bootup and after being removed from a driver.  The only
-		 * transition it allows from this unknown state is to D0, which
-		 * typically happens when a driver calls pci_enable_device().
-		 * We're not ready to enable the device yet, but we do want to
-		 * be able to get to D3.  Therefore first do a D0 transition
-		 * before going to D3.
-		 */
-		vfio_pci_set_power_state(vdev, PCI_D0);
-		vfio_pci_set_power_state(vdev, PCI_D3hot);
-	}
-
-	return ret;
-}
-
-static void vfio_pci_remove(struct pci_dev *pdev)
-{
-	struct vfio_pci_device *vdev;
-
-	vdev = vfio_del_group_dev(&pdev->dev);
-	if (!vdev)
-		return;
-
-	vfio_pci_reflck_put(vdev->reflck);
-
-	vfio_iommu_group_put(pdev->dev.iommu_group, &pdev->dev);
-	kfree(vdev->region);
-	mutex_destroy(&vdev->ioeventfds_lock);
-
-	if (!vdev->disable_idle_d3)
-		vfio_pci_set_power_state(vdev, PCI_D0);
-
-	kfree(vdev->pm_save);
-	kfree(vdev);
-
-	if (vfio_pci_is_vga(pdev)) {
-		vga_client_register(pdev, NULL, NULL, NULL);
-		vga_set_legacy_decoding(pdev,
-				VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM |
-				VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM);
-	}
-}
-
 static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
 						  pci_channel_state_t state)
 {
@@ -1425,14 +1235,6 @@ static const struct pci_error_handlers vfio_err_handlers = {
 	.error_detected = vfio_pci_aer_err_detected,
 };
 
-static struct pci_driver vfio_pci_driver = {
-	.name		= "vfio-pci",
-	.id_table	= NULL, /* only dynamic ids */
-	.probe		= vfio_pci_probe,
-	.remove		= vfio_pci_remove,
-	.err_handler	= &vfio_err_handlers,
-};
-
 static DEFINE_MUTEX(reflck_lock);
 
 static struct vfio_pci_reflck *vfio_pci_reflck_alloc(void)
@@ -1627,12 +1429,6 @@ static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev)
 	kfree(devs.devices);
 }
 
-static void __exit vfio_pci_cleanup(void)
-{
-	pci_unregister_driver(&vfio_pci_driver);
-	vfio_pci_uninit_perm_bits();
-}
-
 void __init vfio_pci_fill_ids(char *ids, struct pci_driver *driver)
 {
 	char *p, *id;
@@ -1673,34 +1469,3 @@ void __init vfio_pci_fill_ids(char *ids, struct pci_driver *driver)
 				class, class_mask);
 	}
 }
-
-static int __init vfio_pci_init(void)
-{
-	int ret;
-
-	/* Allocate shared config space permision data used by all devices */
-	ret = vfio_pci_init_perm_bits();
-	if (ret)
-		return ret;
-
-	/* Register and scan for devices */
-	ret = pci_register_driver(&vfio_pci_driver);
-	if (ret)
-		goto out_driver;
-
-	vfio_pci_fill_ids(ids, &vfio_pci_driver);
-
-	return 0;
-
-out_driver:
-	vfio_pci_uninit_perm_bits();
-	return ret;
-}
-
-module_init(vfio_pci_init);
-module_exit(vfio_pci_cleanup);
-
-MODULE_VERSION(DRIVER_VERSION);
-MODULE_LICENSE("GPL v2");
-MODULE_AUTHOR(DRIVER_AUTHOR);
-MODULE_DESCRIPTION(DRIVER_DESC);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 07/10] vfio_pci: shrink vfio_pci.c
  2019-11-21 11:23 [PATCH v3 00/10] vfio_pci: wrap pci device as a mediated device Liu Yi L
                   ` (5 preceding siblings ...)
  2019-11-21 11:23 ` [PATCH v3 06/10] vfio_pci: shrink vfio_pci_common.c Liu Yi L
@ 2019-11-21 11:23 ` Liu Yi L
  2019-11-21 11:23 ` [PATCH v3 08/10] vfio/pci: protect cap/ecap_perm bits alloc/free Liu Yi L
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Liu Yi L @ 2019-11-21 11:23 UTC (permalink / raw)
  To: alex.williamson, kwankhede
  Cc: kevin.tian, baolu.lu, yi.l.liu, yi.y.sun, joro,
	jean-philippe.brucker, peterx, linux-kernel, kvm

This patch removes the common codes in vfio_pci.c, instead, vfio-pci
module will leverage the common functions implemented in vfio_pci_common.c.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 drivers/vfio/pci/Makefile           |    3 +-
 drivers/vfio/pci/vfio_pci.c         | 1440 -----------------------------------
 drivers/vfio/pci/vfio_pci_common.c  |    2 +-
 drivers/vfio/pci/vfio_pci_private.h |    2 +
 4 files changed, 5 insertions(+), 1442 deletions(-)

diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index f027f8a..d94317a 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
 
-vfio-pci-y := vfio_pci.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o
+vfio-pci-y := vfio_pci.o vfio_pci_common.o vfio_pci_intrs.o \
+		vfio_pci_rdwr.o vfio_pci_config.o
 vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
 vfio-pci-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2.o
 
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 80a6b15..7e24da2 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -54,408 +54,6 @@ module_param(disable_idle_d3, bool, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(disable_idle_d3,
 		 "Disable using the PCI D3 low power state for idle, unused devices");
 
-/*
- * Our VGA arbiter participation is limited since we don't know anything
- * about the device itself.  However, if the device is the only VGA device
- * downstream of a bridge and VFIO VGA support is disabled, then we can
- * safely return legacy VGA IO and memory as not decoded since the user
- * has no way to get to it and routing can be disabled externally at the
- * bridge.
- */
-unsigned int vfio_pci_set_vga_decode(void *opaque, bool single_vga)
-{
-	struct vfio_pci_device *vdev = opaque;
-	struct pci_dev *tmp = NULL, *pdev = vdev->pdev;
-	unsigned char max_busnr;
-	unsigned int decodes;
-
-	if (single_vga || !vfio_vga_disabled(vdev) ||
-		pci_is_root_bus(pdev->bus))
-		return VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM |
-		       VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM;
-
-	max_busnr = pci_bus_max_busnr(pdev->bus);
-	decodes = VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM;
-
-	while ((tmp = pci_get_class(PCI_CLASS_DISPLAY_VGA << 8, tmp)) != NULL) {
-		if (tmp == pdev ||
-		    pci_domain_nr(tmp->bus) != pci_domain_nr(pdev->bus) ||
-		    pci_is_root_bus(tmp->bus))
-			continue;
-
-		if (tmp->bus->number >= pdev->bus->number &&
-		    tmp->bus->number <= max_busnr) {
-			pci_dev_put(tmp);
-			decodes |= VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM;
-			break;
-		}
-	}
-
-	return decodes;
-}
-
-static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev)
-{
-	struct resource *res;
-	int bar;
-	struct vfio_pci_dummy_resource *dummy_res;
-
-	INIT_LIST_HEAD(&vdev->dummy_resources_list);
-
-	for (bar = PCI_STD_RESOURCES; bar <= PCI_STD_RESOURCE_END; bar++) {
-		res = vdev->pdev->resource + bar;
-
-		if (!IS_ENABLED(CONFIG_VFIO_PCI_MMAP))
-			goto no_mmap;
-
-		if (!(res->flags & IORESOURCE_MEM))
-			goto no_mmap;
-
-		/*
-		 * The PCI core shouldn't set up a resource with a
-		 * type but zero size. But there may be bugs that
-		 * cause us to do that.
-		 */
-		if (!resource_size(res))
-			goto no_mmap;
-
-		if (resource_size(res) >= PAGE_SIZE) {
-			vdev->bar_mmap_supported[bar] = true;
-			continue;
-		}
-
-		if (!(res->start & ~PAGE_MASK)) {
-			/*
-			 * Add a dummy resource to reserve the remainder
-			 * of the exclusive page in case that hot-add
-			 * device's bar is assigned into it.
-			 */
-			dummy_res = kzalloc(sizeof(*dummy_res), GFP_KERNEL);
-			if (dummy_res == NULL)
-				goto no_mmap;
-
-			dummy_res->resource.name = "vfio sub-page reserved";
-			dummy_res->resource.start = res->end + 1;
-			dummy_res->resource.end = res->start + PAGE_SIZE - 1;
-			dummy_res->resource.flags = res->flags;
-			if (request_resource(res->parent,
-						&dummy_res->resource)) {
-				kfree(dummy_res);
-				goto no_mmap;
-			}
-			dummy_res->index = bar;
-			list_add(&dummy_res->res_next,
-					&vdev->dummy_resources_list);
-			vdev->bar_mmap_supported[bar] = true;
-			continue;
-		}
-		/*
-		 * Here we don't handle the case when the BAR is not page
-		 * aligned because we can't expect the BAR will be
-		 * assigned into the same location in a page in guest
-		 * when we passthrough the BAR. And it's hard to access
-		 * this BAR in userspace because we have no way to get
-		 * the BAR's location in a page.
-		 */
-no_mmap:
-		vdev->bar_mmap_supported[bar] = false;
-	}
-}
-
-static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev);
-
-/*
- * INTx masking requires the ability to disable INTx signaling via PCI_COMMAND
- * _and_ the ability detect when the device is asserting INTx via PCI_STATUS.
- * If a device implements the former but not the latter we would typically
- * expect broken_intx_masking be set and require an exclusive interrupt.
- * However since we do have control of the device's ability to assert INTx,
- * we can instead pretend that the device does not implement INTx, virtualizing
- * the pin register to report zero and maintaining DisINTx set on the host.
- */
-static bool vfio_pci_nointx(struct pci_dev *pdev)
-{
-	switch (pdev->vendor) {
-	case PCI_VENDOR_ID_INTEL:
-		switch (pdev->device) {
-		/* All i40e (XL710/X710/XXV710) 10/20/25/40GbE NICs */
-		case 0x1572:
-		case 0x1574:
-		case 0x1580 ... 0x1581:
-		case 0x1583 ... 0x158b:
-		case 0x37d0 ... 0x37d2:
-			return true;
-		default:
-			return false;
-		}
-	}
-
-	return false;
-}
-
-void vfio_pci_probe_power_state(struct vfio_pci_device *vdev)
-{
-	struct pci_dev *pdev = vdev->pdev;
-	u16 pmcsr;
-
-	if (!pdev->pm_cap)
-		return;
-
-	pci_read_config_word(pdev, pdev->pm_cap + PCI_PM_CTRL, &pmcsr);
-
-	vdev->needs_pm_restore = !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET);
-}
-
-/*
- * pci_set_power_state() wrapper handling devices which perform a soft reset on
- * D3->D0 transition.  Save state prior to D0/1/2->D3, stash it on the vdev,
- * restore when returned to D0.  Saved separately from pci_saved_state for use
- * by PM capability emulation and separately from pci_dev internal saved state
- * to avoid it being overwritten and consumed around other resets.
- */
-int vfio_pci_set_power_state(struct vfio_pci_device *vdev, pci_power_t state)
-{
-	struct pci_dev *pdev = vdev->pdev;
-	bool needs_restore = false, needs_save = false;
-	int ret;
-
-	if (vdev->needs_pm_restore) {
-		if (pdev->current_state < PCI_D3hot && state >= PCI_D3hot) {
-			pci_save_state(pdev);
-			needs_save = true;
-		}
-
-		if (pdev->current_state >= PCI_D3hot && state <= PCI_D0)
-			needs_restore = true;
-	}
-
-	ret = pci_set_power_state(pdev, state);
-
-	if (!ret) {
-		/* D3 might be unsupported via quirk, skip unless in D3 */
-		if (needs_save && pdev->current_state >= PCI_D3hot) {
-			vdev->pm_save = pci_store_saved_state(pdev);
-		} else if (needs_restore) {
-			pci_load_and_free_saved_state(pdev, &vdev->pm_save);
-			pci_restore_state(pdev);
-		}
-	}
-
-	return ret;
-}
-
-int vfio_pci_enable(struct vfio_pci_device *vdev)
-{
-	struct pci_dev *pdev = vdev->pdev;
-	int ret;
-	u16 cmd;
-	u8 msix_pos;
-
-	vfio_pci_set_power_state(vdev, PCI_D0);
-
-	/* Don't allow our initial saved state to include busmaster */
-	pci_clear_master(pdev);
-
-	ret = pci_enable_device(pdev);
-	if (ret)
-		return ret;
-
-	/* If reset fails because of the device lock, fail this path entirely */
-	ret = pci_try_reset_function(pdev);
-	if (ret == -EAGAIN) {
-		pci_disable_device(pdev);
-		return ret;
-	}
-
-	vdev->reset_works = !ret;
-	pci_save_state(pdev);
-	vdev->pci_saved_state = pci_store_saved_state(pdev);
-	if (!vdev->pci_saved_state)
-		pci_dbg(pdev, "%s: Couldn't store saved state\n", __func__);
-
-	if (likely(!vdev->nointxmask)) {
-		if (vfio_pci_nointx(pdev)) {
-			pci_info(pdev, "Masking broken INTx support\n");
-			vdev->nointx = true;
-			pci_intx(pdev, 0);
-		} else
-			vdev->pci_2_3 = pci_intx_mask_supported(pdev);
-	}
-
-	pci_read_config_word(pdev, PCI_COMMAND, &cmd);
-	if (vdev->pci_2_3 && (cmd & PCI_COMMAND_INTX_DISABLE)) {
-		cmd &= ~PCI_COMMAND_INTX_DISABLE;
-		pci_write_config_word(pdev, PCI_COMMAND, cmd);
-	}
-
-	ret = vfio_config_init(vdev);
-	if (ret) {
-		kfree(vdev->pci_saved_state);
-		vdev->pci_saved_state = NULL;
-		pci_disable_device(pdev);
-		return ret;
-	}
-
-	msix_pos = pdev->msix_cap;
-	if (msix_pos) {
-		u16 flags;
-		u32 table;
-
-		pci_read_config_word(pdev, msix_pos + PCI_MSIX_FLAGS, &flags);
-		pci_read_config_dword(pdev, msix_pos + PCI_MSIX_TABLE, &table);
-
-		vdev->msix_bar = table & PCI_MSIX_TABLE_BIR;
-		vdev->msix_offset = table & PCI_MSIX_TABLE_OFFSET;
-		vdev->msix_size = ((flags & PCI_MSIX_FLAGS_QSIZE) + 1) * 16;
-	} else
-		vdev->msix_bar = 0xFF;
-
-	if (!vfio_vga_disabled(vdev) && vfio_pci_is_vga(pdev))
-		vdev->has_vga = true;
-
-
-	if (vfio_pci_is_vga(pdev) &&
-	    pdev->vendor == PCI_VENDOR_ID_INTEL &&
-	    IS_ENABLED(CONFIG_VFIO_PCI_IGD)) {
-		ret = vfio_pci_igd_init(vdev);
-		if (ret) {
-			pci_warn(pdev, "Failed to setup Intel IGD regions\n");
-			goto disable_exit;
-		}
-	}
-
-	if (pdev->vendor == PCI_VENDOR_ID_NVIDIA &&
-	    IS_ENABLED(CONFIG_VFIO_PCI_NVLINK2)) {
-		ret = vfio_pci_nvdia_v100_nvlink2_init(vdev);
-		if (ret && ret != -ENODEV) {
-			pci_warn(pdev, "Failed to setup NVIDIA NV2 RAM region\n");
-			goto disable_exit;
-		}
-	}
-
-	if (pdev->vendor == PCI_VENDOR_ID_IBM &&
-	    IS_ENABLED(CONFIG_VFIO_PCI_NVLINK2)) {
-		ret = vfio_pci_ibm_npu2_init(vdev);
-		if (ret && ret != -ENODEV) {
-			pci_warn(pdev, "Failed to setup NVIDIA NV2 ATSD region\n");
-			goto disable_exit;
-		}
-	}
-
-	vfio_pci_probe_mmaps(vdev);
-
-	return 0;
-
-disable_exit:
-	vfio_pci_disable(vdev);
-	return ret;
-}
-
-void vfio_pci_disable(struct vfio_pci_device *vdev)
-{
-	struct pci_dev *pdev = vdev->pdev;
-	struct vfio_pci_dummy_resource *dummy_res, *tmp;
-	struct vfio_pci_ioeventfd *ioeventfd, *ioeventfd_tmp;
-	int i, bar;
-
-	/* Stop the device from further DMA */
-	pci_clear_master(pdev);
-
-	vfio_pci_set_irqs_ioctl(vdev, VFIO_IRQ_SET_DATA_NONE |
-				VFIO_IRQ_SET_ACTION_TRIGGER,
-				vdev->irq_type, 0, 0, NULL);
-
-	/* Device closed, don't need mutex here */
-	list_for_each_entry_safe(ioeventfd, ioeventfd_tmp,
-				 &vdev->ioeventfds_list, next) {
-		vfio_virqfd_disable(&ioeventfd->virqfd);
-		list_del(&ioeventfd->next);
-		kfree(ioeventfd);
-	}
-	vdev->ioeventfds_nr = 0;
-
-	vdev->virq_disabled = false;
-
-	for (i = 0; i < vdev->num_regions; i++)
-		vdev->region[i].ops->release(vdev, &vdev->region[i]);
-
-	vdev->num_regions = 0;
-	kfree(vdev->region);
-	vdev->region = NULL; /* don't krealloc a freed pointer */
-
-	vfio_config_free(vdev);
-
-	for (bar = PCI_STD_RESOURCES; bar <= PCI_STD_RESOURCE_END; bar++) {
-		if (!vdev->barmap[bar])
-			continue;
-		pci_iounmap(pdev, vdev->barmap[bar]);
-		pci_release_selected_regions(pdev, 1 << bar);
-		vdev->barmap[bar] = NULL;
-	}
-
-	list_for_each_entry_safe(dummy_res, tmp,
-				 &vdev->dummy_resources_list, res_next) {
-		list_del(&dummy_res->res_next);
-		release_resource(&dummy_res->resource);
-		kfree(dummy_res);
-	}
-
-	vdev->needs_reset = true;
-
-	/*
-	 * If we have saved state, restore it.  If we can reset the device,
-	 * even better.  Resetting with current state seems better than
-	 * nothing, but saving and restoring current state without reset
-	 * is just busy work.
-	 */
-	if (pci_load_and_free_saved_state(pdev, &vdev->pci_saved_state)) {
-		pci_info(pdev, "%s: Couldn't reload saved state\n", __func__);
-
-		if (!vdev->reset_works)
-			goto out;
-
-		pci_save_state(pdev);
-	}
-
-	/*
-	 * Disable INTx and MSI, presumably to avoid spurious interrupts
-	 * during reset.  Stolen from pci_reset_function()
-	 */
-	pci_write_config_word(pdev, PCI_COMMAND, PCI_COMMAND_INTX_DISABLE);
-
-	/*
-	 * Try to get the locks ourselves to prevent a deadlock. The
-	 * success of this is dependent on being able to lock the device,
-	 * which is not always possible.
-	 * We can not use the "try" reset interface here, which will
-	 * overwrite the previously restored configuration information.
-	 */
-	if (vdev->reset_works && pci_cfg_access_trylock(pdev)) {
-		if (device_trylock(&pdev->dev)) {
-			if (!__pci_reset_function_locked(pdev))
-				vdev->needs_reset = false;
-			device_unlock(&pdev->dev);
-		}
-		pci_cfg_access_unlock(pdev);
-	}
-
-	pci_restore_state(pdev);
-out:
-	pci_disable_device(pdev);
-
-	vfio_pci_try_bus_reset(vdev);
-
-	if (!vdev->disable_idle_d3)
-		vfio_pci_set_power_state(vdev, PCI_D3hot);
-}
-
-void vfio_pci_refresh_config(struct vfio_pci_device *vdev,
-			bool nointxmask, bool disable_idle_d3)
-{
-	vdev->nointxmask = nointxmask;
-	vdev->disable_idle_d3 = disable_idle_d3;
-}
-
 static void vfio_pci_release(void *device_data)
 {
 	struct vfio_pci_device *vdev = device_data;
@@ -499,777 +97,6 @@ static int vfio_pci_open(void *device_data)
 	return ret;
 }
 
-static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type)
-{
-	if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) {
-		u8 pin;
-
-		if (!IS_ENABLED(CONFIG_VFIO_PCI_INTX) ||
-		    vdev->nointx || vdev->pdev->is_virtfn)
-			return 0;
-
-		pci_read_config_byte(vdev->pdev, PCI_INTERRUPT_PIN, &pin);
-
-		return pin ? 1 : 0;
-	} else if (irq_type == VFIO_PCI_MSI_IRQ_INDEX) {
-		u8 pos;
-		u16 flags;
-
-		pos = vdev->pdev->msi_cap;
-		if (pos) {
-			pci_read_config_word(vdev->pdev,
-					     pos + PCI_MSI_FLAGS, &flags);
-			return 1 << ((flags & PCI_MSI_FLAGS_QMASK) >> 1);
-		}
-	} else if (irq_type == VFIO_PCI_MSIX_IRQ_INDEX) {
-		u8 pos;
-		u16 flags;
-
-		pos = vdev->pdev->msix_cap;
-		if (pos) {
-			pci_read_config_word(vdev->pdev,
-					     pos + PCI_MSIX_FLAGS, &flags);
-
-			return (flags & PCI_MSIX_FLAGS_QSIZE) + 1;
-		}
-	} else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) {
-		if (pci_is_pcie(vdev->pdev))
-			return 1;
-	} else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
-		return 1;
-	}
-
-	return 0;
-}
-
-static int vfio_pci_count_devs(struct pci_dev *pdev, void *data)
-{
-	(*(int *)data)++;
-	return 0;
-}
-
-struct vfio_pci_fill_info {
-	int max;
-	int cur;
-	struct vfio_pci_dependent_device *devices;
-};
-
-static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
-{
-	struct vfio_pci_fill_info *fill = data;
-	struct iommu_group *iommu_group;
-
-	if (fill->cur == fill->max)
-		return -EAGAIN; /* Something changed, try again */
-
-	iommu_group = iommu_group_get(&pdev->dev);
-	if (!iommu_group)
-		return -EPERM; /* Cannot reset non-isolated devices */
-
-	fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
-	fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
-	fill->devices[fill->cur].bus = pdev->bus->number;
-	fill->devices[fill->cur].devfn = pdev->devfn;
-	fill->cur++;
-	iommu_group_put(iommu_group);
-	return 0;
-}
-
-struct vfio_pci_group_entry {
-	struct vfio_group *group;
-	int id;
-};
-
-struct vfio_pci_group_info {
-	int count;
-	struct vfio_pci_group_entry *groups;
-};
-
-static int vfio_pci_validate_devs(struct pci_dev *pdev, void *data)
-{
-	struct vfio_pci_group_info *info = data;
-	struct iommu_group *group;
-	int id, i;
-
-	group = iommu_group_get(&pdev->dev);
-	if (!group)
-		return -EPERM;
-
-	id = iommu_group_id(group);
-
-	for (i = 0; i < info->count; i++)
-		if (info->groups[i].id == id)
-			break;
-
-	iommu_group_put(group);
-
-	return (i == info->count) ? -EINVAL : 0;
-}
-
-static bool vfio_pci_dev_below_slot(struct pci_dev *pdev, struct pci_slot *slot)
-{
-	for (; pdev; pdev = pdev->bus->self)
-		if (pdev->bus == slot->bus)
-			return (pdev->slot == slot);
-	return false;
-}
-
-struct vfio_pci_walk_info {
-	int (*fn)(struct pci_dev *, void *data);
-	void *data;
-	struct pci_dev *pdev;
-	bool slot;
-	int ret;
-};
-
-static int vfio_pci_walk_wrapper(struct pci_dev *pdev, void *data)
-{
-	struct vfio_pci_walk_info *walk = data;
-
-	if (!walk->slot || vfio_pci_dev_below_slot(pdev, walk->pdev->slot))
-		walk->ret = walk->fn(pdev, walk->data);
-
-	return walk->ret;
-}
-
-static int vfio_pci_for_each_slot_or_bus(struct pci_dev *pdev,
-					 int (*fn)(struct pci_dev *,
-						   void *data), void *data,
-					 bool slot)
-{
-	struct vfio_pci_walk_info walk = {
-		.fn = fn, .data = data, .pdev = pdev, .slot = slot, .ret = 0,
-	};
-
-	pci_walk_bus(pdev->bus, vfio_pci_walk_wrapper, &walk);
-
-	return walk.ret;
-}
-
-static int msix_mmappable_cap(struct vfio_pci_device *vdev,
-			      struct vfio_info_cap *caps)
-{
-	struct vfio_info_cap_header header = {
-		.id = VFIO_REGION_INFO_CAP_MSIX_MAPPABLE,
-		.version = 1
-	};
-
-	return vfio_info_add_capability(caps, &header, sizeof(header));
-}
-
-int vfio_pci_register_dev_region(struct vfio_pci_device *vdev,
-				 unsigned int type, unsigned int subtype,
-				 const struct vfio_pci_regops *ops,
-				 size_t size, u32 flags, void *data)
-{
-	struct vfio_pci_region *region;
-
-	region = krealloc(vdev->region,
-			  (vdev->num_regions + 1) * sizeof(*region),
-			  GFP_KERNEL);
-	if (!region)
-		return -ENOMEM;
-
-	vdev->region = region;
-	vdev->region[vdev->num_regions].type = type;
-	vdev->region[vdev->num_regions].subtype = subtype;
-	vdev->region[vdev->num_regions].ops = ops;
-	vdev->region[vdev->num_regions].size = size;
-	vdev->region[vdev->num_regions].flags = flags;
-	vdev->region[vdev->num_regions].data = data;
-
-	vdev->num_regions++;
-
-	return 0;
-}
-
-long vfio_pci_ioctl(void *device_data,
-		   unsigned int cmd, unsigned long arg)
-{
-	struct vfio_pci_device *vdev = device_data;
-	unsigned long minsz;
-
-	if (cmd == VFIO_DEVICE_GET_INFO) {
-		struct vfio_device_info info;
-
-		minsz = offsetofend(struct vfio_device_info, num_irqs);
-
-		if (copy_from_user(&info, (void __user *)arg, minsz))
-			return -EFAULT;
-
-		if (info.argsz < minsz)
-			return -EINVAL;
-
-		info.flags = VFIO_DEVICE_FLAGS_PCI;
-
-		if (vdev->reset_works)
-			info.flags |= VFIO_DEVICE_FLAGS_RESET;
-
-		info.num_regions = VFIO_PCI_NUM_REGIONS + vdev->num_regions;
-		info.num_irqs = VFIO_PCI_NUM_IRQS;
-
-		return copy_to_user((void __user *)arg, &info, minsz) ?
-			-EFAULT : 0;
-
-	} else if (cmd == VFIO_DEVICE_GET_REGION_INFO) {
-		struct pci_dev *pdev = vdev->pdev;
-		struct vfio_region_info info;
-		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
-		int i, ret;
-
-		minsz = offsetofend(struct vfio_region_info, offset);
-
-		if (copy_from_user(&info, (void __user *)arg, minsz))
-			return -EFAULT;
-
-		if (info.argsz < minsz)
-			return -EINVAL;
-
-		switch (info.index) {
-		case VFIO_PCI_CONFIG_REGION_INDEX:
-			info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
-			info.size = pdev->cfg_size;
-			info.flags = VFIO_REGION_INFO_FLAG_READ |
-				     VFIO_REGION_INFO_FLAG_WRITE;
-			break;
-		case VFIO_PCI_BAR0_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX:
-			info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
-			info.size = pci_resource_len(pdev, info.index);
-			if (!info.size) {
-				info.flags = 0;
-				break;
-			}
-
-			info.flags = VFIO_REGION_INFO_FLAG_READ |
-				     VFIO_REGION_INFO_FLAG_WRITE;
-			if (vdev->bar_mmap_supported[info.index]) {
-				info.flags |= VFIO_REGION_INFO_FLAG_MMAP;
-				if (info.index == vdev->msix_bar) {
-					ret = msix_mmappable_cap(vdev, &caps);
-					if (ret)
-						return ret;
-				}
-			}
-
-			break;
-		case VFIO_PCI_ROM_REGION_INDEX:
-		{
-			void __iomem *io;
-			size_t size;
-			u16 orig_cmd;
-
-			info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
-			info.flags = 0;
-
-			/* Report the BAR size, not the ROM size */
-			info.size = pci_resource_len(pdev, info.index);
-			if (!info.size) {
-				/* Shadow ROMs appear as PCI option ROMs */
-				if (pdev->resource[PCI_ROM_RESOURCE].flags &
-							IORESOURCE_ROM_SHADOW)
-					info.size = 0x20000;
-				else
-					break;
-			}
-
-			/*
-			 * Is it really there?  Enable memory decode for
-			 * implicit access in pci_map_rom().
-			 */
-			pci_read_config_word(pdev, PCI_COMMAND, &orig_cmd);
-			pci_write_config_word(pdev, PCI_COMMAND,
-					      orig_cmd | PCI_COMMAND_MEMORY);
-
-			io = pci_map_rom(pdev, &size);
-			if (io) {
-				info.flags = VFIO_REGION_INFO_FLAG_READ;
-				pci_unmap_rom(pdev, io);
-			} else {
-				info.size = 0;
-			}
-
-			pci_write_config_word(pdev, PCI_COMMAND, orig_cmd);
-			break;
-		}
-		case VFIO_PCI_VGA_REGION_INDEX:
-			if (!vdev->has_vga)
-				return -EINVAL;
-
-			info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
-			info.size = 0xc0000;
-			info.flags = VFIO_REGION_INFO_FLAG_READ |
-				     VFIO_REGION_INFO_FLAG_WRITE;
-
-			break;
-		default:
-		{
-			struct vfio_region_info_cap_type cap_type = {
-					.header.id = VFIO_REGION_INFO_CAP_TYPE,
-					.header.version = 1 };
-
-			if (info.index >=
-			    VFIO_PCI_NUM_REGIONS + vdev->num_regions)
-				return -EINVAL;
-			info.index = array_index_nospec(info.index,
-							VFIO_PCI_NUM_REGIONS +
-							vdev->num_regions);
-
-			i = info.index - VFIO_PCI_NUM_REGIONS;
-
-			info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
-			info.size = vdev->region[i].size;
-			info.flags = vdev->region[i].flags;
-
-			cap_type.type = vdev->region[i].type;
-			cap_type.subtype = vdev->region[i].subtype;
-
-			ret = vfio_info_add_capability(&caps, &cap_type.header,
-						       sizeof(cap_type));
-			if (ret)
-				return ret;
-
-			if (vdev->region[i].ops->add_capability) {
-				ret = vdev->region[i].ops->add_capability(vdev,
-						&vdev->region[i], &caps);
-				if (ret)
-					return ret;
-			}
-		}
-		}
-
-		if (caps.size) {
-			info.flags |= VFIO_REGION_INFO_FLAG_CAPS;
-			if (info.argsz < sizeof(info) + caps.size) {
-				info.argsz = sizeof(info) + caps.size;
-				info.cap_offset = 0;
-			} else {
-				vfio_info_cap_shift(&caps, sizeof(info));
-				if (copy_to_user((void __user *)arg +
-						  sizeof(info), caps.buf,
-						  caps.size)) {
-					kfree(caps.buf);
-					return -EFAULT;
-				}
-				info.cap_offset = sizeof(info);
-			}
-
-			kfree(caps.buf);
-		}
-
-		return copy_to_user((void __user *)arg, &info, minsz) ?
-			-EFAULT : 0;
-
-	} else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) {
-		struct vfio_irq_info info;
-
-		minsz = offsetofend(struct vfio_irq_info, count);
-
-		if (copy_from_user(&info, (void __user *)arg, minsz))
-			return -EFAULT;
-
-		if (info.argsz < minsz || info.index >= VFIO_PCI_NUM_IRQS)
-			return -EINVAL;
-
-		switch (info.index) {
-		case VFIO_PCI_INTX_IRQ_INDEX ... VFIO_PCI_MSIX_IRQ_INDEX:
-		case VFIO_PCI_REQ_IRQ_INDEX:
-			break;
-		case VFIO_PCI_ERR_IRQ_INDEX:
-			if (pci_is_pcie(vdev->pdev))
-				break;
-		/* fall through */
-		default:
-			return -EINVAL;
-		}
-
-		info.flags = VFIO_IRQ_INFO_EVENTFD;
-
-		info.count = vfio_pci_get_irq_count(vdev, info.index);
-
-		if (info.index == VFIO_PCI_INTX_IRQ_INDEX)
-			info.flags |= (VFIO_IRQ_INFO_MASKABLE |
-				       VFIO_IRQ_INFO_AUTOMASKED);
-		else
-			info.flags |= VFIO_IRQ_INFO_NORESIZE;
-
-		return copy_to_user((void __user *)arg, &info, minsz) ?
-			-EFAULT : 0;
-
-	} else if (cmd == VFIO_DEVICE_SET_IRQS) {
-		struct vfio_irq_set hdr;
-		u8 *data = NULL;
-		int max, ret = 0;
-		size_t data_size = 0;
-
-		minsz = offsetofend(struct vfio_irq_set, count);
-
-		if (copy_from_user(&hdr, (void __user *)arg, minsz))
-			return -EFAULT;
-
-		max = vfio_pci_get_irq_count(vdev, hdr.index);
-
-		ret = vfio_set_irqs_validate_and_prepare(&hdr, max,
-						 VFIO_PCI_NUM_IRQS, &data_size);
-		if (ret)
-			return ret;
-
-		if (data_size) {
-			data = memdup_user((void __user *)(arg + minsz),
-					    data_size);
-			if (IS_ERR(data))
-				return PTR_ERR(data);
-		}
-
-		mutex_lock(&vdev->igate);
-
-		ret = vfio_pci_set_irqs_ioctl(vdev, hdr.flags, hdr.index,
-					      hdr.start, hdr.count, data);
-
-		mutex_unlock(&vdev->igate);
-		kfree(data);
-
-		return ret;
-
-	} else if (cmd == VFIO_DEVICE_RESET) {
-		return vdev->reset_works ?
-			pci_try_reset_function(vdev->pdev) : -EINVAL;
-
-	} else if (cmd == VFIO_DEVICE_GET_PCI_HOT_RESET_INFO) {
-		struct vfio_pci_hot_reset_info hdr;
-		struct vfio_pci_fill_info fill = { 0 };
-		struct vfio_pci_dependent_device *devices = NULL;
-		bool slot = false;
-		int ret = 0;
-
-		minsz = offsetofend(struct vfio_pci_hot_reset_info, count);
-
-		if (copy_from_user(&hdr, (void __user *)arg, minsz))
-			return -EFAULT;
-
-		if (hdr.argsz < minsz)
-			return -EINVAL;
-
-		hdr.flags = 0;
-
-		/* Can we do a slot or bus reset or neither? */
-		if (!pci_probe_reset_slot(vdev->pdev->slot))
-			slot = true;
-		else if (pci_probe_reset_bus(vdev->pdev->bus))
-			return -ENODEV;
-
-		/* How many devices are affected? */
-		ret = vfio_pci_for_each_slot_or_bus(vdev->pdev,
-						    vfio_pci_count_devs,
-						    &fill.max, slot);
-		if (ret)
-			return ret;
-
-		WARN_ON(!fill.max); /* Should always be at least one */
-
-		/*
-		 * If there's enough space, fill it now, otherwise return
-		 * -ENOSPC and the number of devices affected.
-		 */
-		if (hdr.argsz < sizeof(hdr) + (fill.max * sizeof(*devices))) {
-			ret = -ENOSPC;
-			hdr.count = fill.max;
-			goto reset_info_exit;
-		}
-
-		devices = kcalloc(fill.max, sizeof(*devices), GFP_KERNEL);
-		if (!devices)
-			return -ENOMEM;
-
-		fill.devices = devices;
-
-		ret = vfio_pci_for_each_slot_or_bus(vdev->pdev,
-						    vfio_pci_fill_devs,
-						    &fill, slot);
-
-		/*
-		 * If a device was removed between counting and filling,
-		 * we may come up short of fill.max.  If a device was
-		 * added, we'll have a return of -EAGAIN above.
-		 */
-		if (!ret)
-			hdr.count = fill.cur;
-
-reset_info_exit:
-		if (copy_to_user((void __user *)arg, &hdr, minsz))
-			ret = -EFAULT;
-
-		if (!ret) {
-			if (copy_to_user((void __user *)(arg + minsz), devices,
-					 hdr.count * sizeof(*devices)))
-				ret = -EFAULT;
-		}
-
-		kfree(devices);
-		return ret;
-
-	} else if (cmd == VFIO_DEVICE_PCI_HOT_RESET) {
-		struct vfio_pci_hot_reset hdr;
-		int32_t *group_fds;
-		struct vfio_pci_group_entry *groups;
-		struct vfio_pci_group_info info;
-		bool slot = false;
-		int i, count = 0, ret = 0;
-
-		minsz = offsetofend(struct vfio_pci_hot_reset, count);
-
-		if (copy_from_user(&hdr, (void __user *)arg, minsz))
-			return -EFAULT;
-
-		if (hdr.argsz < minsz || hdr.flags)
-			return -EINVAL;
-
-		/* Can we do a slot or bus reset or neither? */
-		if (!pci_probe_reset_slot(vdev->pdev->slot))
-			slot = true;
-		else if (pci_probe_reset_bus(vdev->pdev->bus))
-			return -ENODEV;
-
-		/*
-		 * We can't let userspace give us an arbitrarily large
-		 * buffer to copy, so verify how many we think there
-		 * could be.  Note groups can have multiple devices so
-		 * one group per device is the max.
-		 */
-		ret = vfio_pci_for_each_slot_or_bus(vdev->pdev,
-						    vfio_pci_count_devs,
-						    &count, slot);
-		if (ret)
-			return ret;
-
-		/* Somewhere between 1 and count is OK */
-		if (!hdr.count || hdr.count > count)
-			return -EINVAL;
-
-		group_fds = kcalloc(hdr.count, sizeof(*group_fds), GFP_KERNEL);
-		groups = kcalloc(hdr.count, sizeof(*groups), GFP_KERNEL);
-		if (!group_fds || !groups) {
-			kfree(group_fds);
-			kfree(groups);
-			return -ENOMEM;
-		}
-
-		if (copy_from_user(group_fds, (void __user *)(arg + minsz),
-				   hdr.count * sizeof(*group_fds))) {
-			kfree(group_fds);
-			kfree(groups);
-			return -EFAULT;
-		}
-
-		/*
-		 * For each group_fd, get the group through the vfio external
-		 * user interface and store the group and iommu ID.  This
-		 * ensures the group is held across the reset.
-		 */
-		for (i = 0; i < hdr.count; i++) {
-			struct vfio_group *group;
-			struct fd f = fdget(group_fds[i]);
-			if (!f.file) {
-				ret = -EBADF;
-				break;
-			}
-
-			group = vfio_group_get_external_user(f.file);
-			fdput(f);
-			if (IS_ERR(group)) {
-				ret = PTR_ERR(group);
-				break;
-			}
-
-			groups[i].group = group;
-			groups[i].id = vfio_external_user_iommu_id(group);
-		}
-
-		kfree(group_fds);
-
-		/* release reference to groups on error */
-		if (ret)
-			goto hot_reset_release;
-
-		info.count = hdr.count;
-		info.groups = groups;
-
-		/*
-		 * Test whether all the affected devices are contained
-		 * by the set of groups provided by the user.
-		 */
-		ret = vfio_pci_for_each_slot_or_bus(vdev->pdev,
-						    vfio_pci_validate_devs,
-						    &info, slot);
-		if (!ret)
-			/* User has access, do the reset */
-			ret = pci_reset_bus(vdev->pdev);
-
-hot_reset_release:
-		for (i--; i >= 0; i--)
-			vfio_group_put_external_user(groups[i].group);
-
-		kfree(groups);
-		return ret;
-	} else if (cmd == VFIO_DEVICE_IOEVENTFD) {
-		struct vfio_device_ioeventfd ioeventfd;
-		int count;
-
-		minsz = offsetofend(struct vfio_device_ioeventfd, fd);
-
-		if (copy_from_user(&ioeventfd, (void __user *)arg, minsz))
-			return -EFAULT;
-
-		if (ioeventfd.argsz < minsz)
-			return -EINVAL;
-
-		if (ioeventfd.flags & ~VFIO_DEVICE_IOEVENTFD_SIZE_MASK)
-			return -EINVAL;
-
-		count = ioeventfd.flags & VFIO_DEVICE_IOEVENTFD_SIZE_MASK;
-
-		if (hweight8(count) != 1 || ioeventfd.fd < -1)
-			return -EINVAL;
-
-		return vfio_pci_ioeventfd(vdev, ioeventfd.offset,
-					  ioeventfd.data, count, ioeventfd.fd);
-	}
-
-	return -ENOTTY;
-}
-
-static ssize_t vfio_pci_rw(void *device_data, char __user *buf,
-			   size_t count, loff_t *ppos, bool iswrite)
-{
-	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
-	struct vfio_pci_device *vdev = device_data;
-
-	if (index >= VFIO_PCI_NUM_REGIONS + vdev->num_regions)
-		return -EINVAL;
-
-	switch (index) {
-	case VFIO_PCI_CONFIG_REGION_INDEX:
-		return vfio_pci_config_rw(vdev, buf, count, ppos, iswrite);
-
-	case VFIO_PCI_ROM_REGION_INDEX:
-		if (iswrite)
-			return -EINVAL;
-		return vfio_pci_bar_rw(vdev, buf, count, ppos, false);
-
-	case VFIO_PCI_BAR0_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX:
-		return vfio_pci_bar_rw(vdev, buf, count, ppos, iswrite);
-
-	case VFIO_PCI_VGA_REGION_INDEX:
-		return vfio_pci_vga_rw(vdev, buf, count, ppos, iswrite);
-	default:
-		index -= VFIO_PCI_NUM_REGIONS;
-		return vdev->region[index].ops->rw(vdev, buf,
-						   count, ppos, iswrite);
-	}
-
-	return -EINVAL;
-}
-
-ssize_t vfio_pci_read(void *device_data, char __user *buf,
-			     size_t count, loff_t *ppos)
-{
-	if (!count)
-		return 0;
-
-	return vfio_pci_rw(device_data, buf, count, ppos, false);
-}
-
-ssize_t vfio_pci_write(void *device_data, const char __user *buf,
-			      size_t count, loff_t *ppos)
-{
-	if (!count)
-		return 0;
-
-	return vfio_pci_rw(device_data, (char __user *)buf, count, ppos, true);
-}
-
-int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
-{
-	struct vfio_pci_device *vdev = device_data;
-	struct pci_dev *pdev = vdev->pdev;
-	unsigned int index;
-	u64 phys_len, req_len, pgoff, req_start;
-	int ret;
-
-	index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
-
-	if (vma->vm_end < vma->vm_start)
-		return -EINVAL;
-	if ((vma->vm_flags & VM_SHARED) == 0)
-		return -EINVAL;
-	if (index >= VFIO_PCI_NUM_REGIONS) {
-		int regnum = index - VFIO_PCI_NUM_REGIONS;
-		struct vfio_pci_region *region = vdev->region + regnum;
-
-		if (region && region->ops && region->ops->mmap &&
-		    (region->flags & VFIO_REGION_INFO_FLAG_MMAP))
-			return region->ops->mmap(vdev, region, vma);
-		return -EINVAL;
-	}
-	if (index >= VFIO_PCI_ROM_REGION_INDEX)
-		return -EINVAL;
-	if (!vdev->bar_mmap_supported[index])
-		return -EINVAL;
-
-	phys_len = PAGE_ALIGN(pci_resource_len(pdev, index));
-	req_len = vma->vm_end - vma->vm_start;
-	pgoff = vma->vm_pgoff &
-		((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1);
-	req_start = pgoff << PAGE_SHIFT;
-
-	if (req_start + req_len > phys_len)
-		return -EINVAL;
-
-	/*
-	 * Even though we don't make use of the barmap for the mmap,
-	 * we need to request the region and the barmap tracks that.
-	 */
-	if (!vdev->barmap[index]) {
-		ret = pci_request_selected_regions(pdev,
-						   1 << index, "vfio-pci");
-		if (ret)
-			return ret;
-
-		vdev->barmap[index] = pci_iomap(pdev, index, 0);
-		if (!vdev->barmap[index]) {
-			pci_release_selected_regions(pdev, 1 << index);
-			return -ENOMEM;
-		}
-	}
-
-	vma->vm_private_data = vdev;
-	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
-	vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff;
-
-	return remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
-			       req_len, vma->vm_page_prot);
-}
-
-void vfio_pci_request(void *device_data, unsigned int count)
-{
-	struct vfio_pci_device *vdev = device_data;
-	struct pci_dev *pdev = vdev->pdev;
-
-	mutex_lock(&vdev->igate);
-
-	if (vdev->req_trigger) {
-		if (!(count % 10))
-			pci_notice_ratelimited(pdev,
-				"Relaying device request to user (#%u)\n",
-				count);
-		eventfd_signal(vdev->req_trigger, 1);
-	} else if (count == 0) {
-		pci_warn(pdev,
-			"No device request channel registered, blocked until released by user\n");
-	}
-
-	mutex_unlock(&vdev->igate);
-}
-
 static const struct vfio_device_ops vfio_pci_ops = {
 	.name		= "vfio-pci",
 	.open		= vfio_pci_open,
@@ -1393,38 +220,6 @@ static void vfio_pci_remove(struct pci_dev *pdev)
 	}
 }
 
-static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
-						  pci_channel_state_t state)
-{
-	struct vfio_pci_device *vdev;
-	struct vfio_device *device;
-
-	device = vfio_device_get_from_dev(&pdev->dev);
-	if (device == NULL)
-		return PCI_ERS_RESULT_DISCONNECT;
-
-	vdev = vfio_device_data(device);
-	if (vdev == NULL) {
-		vfio_device_put(device);
-		return PCI_ERS_RESULT_DISCONNECT;
-	}
-
-	mutex_lock(&vdev->igate);
-
-	if (vdev->err_trigger)
-		eventfd_signal(vdev->err_trigger, 1);
-
-	mutex_unlock(&vdev->igate);
-
-	vfio_device_put(device);
-
-	return PCI_ERS_RESULT_CAN_RECOVER;
-}
-
-static const struct pci_error_handlers vfio_err_handlers = {
-	.error_detected = vfio_pci_aer_err_detected,
-};
-
 static struct pci_driver vfio_pci_driver = {
 	.name		= "vfio-pci",
 	.id_table	= NULL, /* only dynamic ids */
@@ -1433,247 +228,12 @@ static struct pci_driver vfio_pci_driver = {
 	.err_handler	= &vfio_err_handlers,
 };
 
-static DEFINE_MUTEX(reflck_lock);
-
-static struct vfio_pci_reflck *vfio_pci_reflck_alloc(void)
-{
-	struct vfio_pci_reflck *reflck;
-
-	reflck = kzalloc(sizeof(*reflck), GFP_KERNEL);
-	if (!reflck)
-		return ERR_PTR(-ENOMEM);
-
-	kref_init(&reflck->kref);
-	mutex_init(&reflck->lock);
-
-	return reflck;
-}
-
-static void vfio_pci_reflck_get(struct vfio_pci_reflck *reflck)
-{
-	kref_get(&reflck->kref);
-}
-
-static int vfio_pci_reflck_find(struct pci_dev *pdev, void *data)
-{
-	struct vfio_pci_device *vdev = data;
-	struct vfio_pci_reflck **preflck = &vdev->reflck;
-	struct vfio_device *device;
-	struct vfio_pci_device *tmp;
-
-	device = vfio_device_get_from_dev(&pdev->dev);
-	if (!device)
-		return 0;
-
-	if (pci_dev_driver(pdev) != pci_dev_driver(vdev->pdev)) {
-		vfio_device_put(device);
-		return 0;
-	}
-
-	tmp = vfio_device_data(device);
-
-	if (tmp->reflck) {
-		vfio_pci_reflck_get(tmp->reflck);
-		*preflck = tmp->reflck;
-		vfio_device_put(device);
-		return 1;
-	}
-
-	vfio_device_put(device);
-	return 0;
-}
-
-int vfio_pci_reflck_attach(struct vfio_pci_device *vdev)
-{
-	bool slot = !pci_probe_reset_slot(vdev->pdev->slot);
-
-	mutex_lock(&reflck_lock);
-
-	if (pci_is_root_bus(vdev->pdev->bus) ||
-	    vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_reflck_find,
-					  vdev, slot) <= 0)
-		vdev->reflck = vfio_pci_reflck_alloc();
-
-	mutex_unlock(&reflck_lock);
-
-	return PTR_ERR_OR_ZERO(vdev->reflck);
-}
-
-static void vfio_pci_reflck_release(struct kref *kref)
-{
-	struct vfio_pci_reflck *reflck = container_of(kref,
-						      struct vfio_pci_reflck,
-						      kref);
-
-	kfree(reflck);
-	mutex_unlock(&reflck_lock);
-}
-
-void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck)
-{
-	kref_put_mutex(&reflck->kref, vfio_pci_reflck_release, &reflck_lock);
-}
-
-struct vfio_devices {
-	struct vfio_device **devices;
-	struct vfio_pci_device *vdev;
-	int cur_index;
-	int max_index;
-};
-
-static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data)
-{
-	struct vfio_devices *devs = data;
-	struct vfio_device *device;
-	struct vfio_pci_device *tmp;
-
-	if (devs->cur_index == devs->max_index)
-		return -ENOSPC;
-
-	device = vfio_device_get_from_dev(&pdev->dev);
-	if (!device)
-		return -EINVAL;
-
-	if (pci_dev_driver(pdev) != pci_dev_driver(devs->vdev->pdev)) {
-		vfio_device_put(device);
-		return -EBUSY;
-	}
-
-	tmp = vfio_device_data(device);
-
-	/* Fault if the device is not unused */
-	if (tmp->refcnt) {
-		vfio_device_put(device);
-		return -EBUSY;
-	}
-
-	devs->devices[devs->cur_index++] = device;
-	return 0;
-}
-
-/*
- * If a bus or slot reset is available for the provided device and:
- *  - All of the devices affected by that bus or slot reset are unused
- *    (!refcnt)
- *  - At least one of the affected devices is marked dirty via
- *    needs_reset (such as by lack of FLR support)
- * Then attempt to perform that bus or slot reset.  Callers are required
- * to hold vdev->reflck->lock, protecting the bus/slot reset group from
- * concurrent opens.  A vfio_device reference is acquired for each device
- * to prevent unbinds during the reset operation.
- *
- * NB: vfio-core considers a group to be viable even if some devices are
- * bound to drivers like pci-stub or pcieport.  Here we require all devices
- * to be bound to vfio_pci since that's the only way we can be sure they
- * stay put.
- */
-static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev)
-{
-	struct vfio_devices devs = { .cur_index = 0 };
-	int i = 0, ret = -EINVAL;
-	bool slot = false;
-	struct vfio_pci_device *tmp;
-
-	if (!pci_probe_reset_slot(vdev->pdev->slot))
-		slot = true;
-	else if (pci_probe_reset_bus(vdev->pdev->bus))
-		return;
-
-	if (vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_count_devs,
-					  &i, slot) || !i)
-		return;
-
-	devs.max_index = i;
-	devs.devices = kcalloc(i, sizeof(struct vfio_device *), GFP_KERNEL);
-	if (!devs.devices)
-		return;
-
-	devs.vdev = vdev;
-	if (vfio_pci_for_each_slot_or_bus(vdev->pdev,
-					  vfio_pci_get_unused_devs,
-					  &devs, slot))
-		goto put_devs;
-
-	/* Does at least one need a reset? */
-	for (i = 0; i < devs.cur_index; i++) {
-		tmp = vfio_device_data(devs.devices[i]);
-		if (tmp->needs_reset) {
-			ret = pci_reset_bus(vdev->pdev);
-			break;
-		}
-	}
-
-put_devs:
-	for (i = 0; i < devs.cur_index; i++) {
-		tmp = vfio_device_data(devs.devices[i]);
-
-		/*
-		 * If reset was successful, affected devices no longer need
-		 * a reset and we should return all the collateral devices
-		 * to low power.  If not successful, we either didn't reset
-		 * the bus or timed out waiting for it, so let's not touch
-		 * the power state.
-		 */
-		if (!ret) {
-			tmp->needs_reset = false;
-
-			if (tmp != vdev && !tmp->disable_idle_d3)
-				vfio_pci_set_power_state(tmp, PCI_D3hot);
-		}
-
-		vfio_device_put(devs.devices[i]);
-	}
-
-	kfree(devs.devices);
-}
-
 static void __exit vfio_pci_cleanup(void)
 {
 	pci_unregister_driver(&vfio_pci_driver);
 	vfio_pci_uninit_perm_bits();
 }
 
-void __init vfio_pci_fill_ids(char *ids, struct pci_driver *driver)
-{
-	char *p, *id;
-	int rc;
-
-	/* no ids passed actually */
-	if (ids[0] == '\0')
-		return;
-
-	/* add ids specified in the module parameter */
-	p = ids;
-	while ((id = strsep(&p, ","))) {
-		unsigned int vendor, device, subvendor = PCI_ANY_ID,
-			subdevice = PCI_ANY_ID, class = 0, class_mask = 0;
-		int fields;
-
-		if (!strlen(id))
-			continue;
-
-		fields = sscanf(id, "%x:%x:%x:%x:%x:%x",
-				&vendor, &device, &subvendor, &subdevice,
-				&class, &class_mask);
-
-		if (fields < 2) {
-			pr_warn("invalid id string \"%s\"\n", id);
-			continue;
-		}
-
-		rc = pci_add_dynid(driver, vendor, device,
-				   subvendor, subdevice, class, class_mask, 0);
-		if (rc)
-			pr_warn("failed to add dynamic id [%04x:%04x[%04x:%04x]] class %#08x/%08x (%d)\n",
-				vendor, device, subvendor, subdevice,
-				class, class_mask, rc);
-		else
-			pr_info("add [%04x:%04x[%04x:%04x]] class %#08x/%08x\n",
-				vendor, device, subvendor, subdevice,
-				class, class_mask);
-	}
-}
-
 static int __init vfio_pci_init(void)
 {
 	int ret;
diff --git a/drivers/vfio/pci/vfio_pci_common.c b/drivers/vfio/pci/vfio_pci_common.c
index 4664733a..42b46d0 100644
--- a/drivers/vfio/pci/vfio_pci_common.c
+++ b/drivers/vfio/pci/vfio_pci_common.c
@@ -1231,7 +1231,7 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
 	return PCI_ERS_RESULT_CAN_RECOVER;
 }
 
-static const struct pci_error_handlers vfio_err_handlers = {
+const struct pci_error_handlers vfio_err_handlers = {
 	.error_detected = vfio_pci_aer_err_detected,
 };
 
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index 670268e..5d17394 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -135,6 +135,8 @@ struct vfio_pci_device {
 #define is_irq_none(vdev) (!(is_intx(vdev) || is_msi(vdev) || is_msix(vdev)))
 #define irq_is(vdev, type) (vdev->irq_type == type)
 
+extern const struct pci_error_handlers vfio_err_handlers;
+
 static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
 {
 	return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 08/10] vfio/pci: protect cap/ecap_perm bits alloc/free
  2019-11-21 11:23 [PATCH v3 00/10] vfio_pci: wrap pci device as a mediated device Liu Yi L
                   ` (6 preceding siblings ...)
  2019-11-21 11:23 ` [PATCH v3 07/10] vfio_pci: shrink vfio_pci.c Liu Yi L
@ 2019-11-21 11:23 ` Liu Yi L
  2019-12-15 22:46   ` Alex Williamson
  2019-11-21 11:23 ` [PATCH v3 09/10] samples: add vfio-mdev-pci driver Liu Yi L
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 18+ messages in thread
From: Liu Yi L @ 2019-11-21 11:23 UTC (permalink / raw)
  To: alex.williamson, kwankhede
  Cc: kevin.tian, baolu.lu, yi.l.liu, yi.y.sun, joro,
	jean-philippe.brucker, peterx, linux-kernel, kvm

This patch add a user numer track for the shared cap/ecap_perms bits,
and the alloc/free will hold a semaphore to protect the operations.
With the changes, first caller of vfio_pci_init_perm_bits() will
initialize the bits. While the last caller of vfio_pci_uninit_perm_bits()
will free the bits. This is a preparation to have multiple cap/ecap_perms
bits users.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 drivers/vfio/pci/vfio_pci_config.c | 33 +++++++++++++++++++++++++++++++--
 1 file changed, 31 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index f0891bd..274c993 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -36,6 +36,13 @@
 	 (offset >= PCI_ROM_ADDRESS && offset < PCI_ROM_ADDRESS + 4))
 
 /*
+ * vfio_perm_bits_sem: prorects the shared perm_bits alloc/free
+ * vfio_pci_perm_bits_users: tracks the user of the shared perm_bits
+ */
+static DEFINE_SEMAPHORE(vfio_perm_bits_sem);
+static int vfio_pci_perm_bits_users;
+
+/*
  * Lengths of PCI Config Capabilities
  *   0: Removed from the user visible capability list
  *   FF: Variable length
@@ -995,7 +1002,7 @@ static int __init init_pci_ext_cap_pwr_perm(struct perm_bits *perm)
 /*
  * Initialize the shared permission tables
  */
-void vfio_pci_uninit_perm_bits(void)
+static void vfio_pci_uninit_perm_bits_internal(void)
 {
 	free_perm_bits(&cap_perms[PCI_CAP_ID_BASIC]);
 
@@ -1009,10 +1016,30 @@ void vfio_pci_uninit_perm_bits(void)
 	free_perm_bits(&ecap_perms[PCI_EXT_CAP_ID_PWR]);
 }
 
+void vfio_pci_uninit_perm_bits(void)
+{
+	down(&vfio_perm_bits_sem);
+
+	if (--vfio_pci_perm_bits_users > 0)
+		goto out;
+
+	vfio_pci_uninit_perm_bits_internal();
+
+out:
+	up(&vfio_perm_bits_sem);
+}
+
 int __init vfio_pci_init_perm_bits(void)
 {
 	int ret;
 
+	down(&vfio_perm_bits_sem);
+
+	if (++vfio_pci_perm_bits_users > 1) {
+		ret = 0;
+		goto out;
+	}
+
 	/* Basic config space */
 	ret = init_pci_cap_basic_perm(&cap_perms[PCI_CAP_ID_BASIC]);
 
@@ -1030,8 +1057,10 @@ int __init vfio_pci_init_perm_bits(void)
 	ecap_perms[PCI_EXT_CAP_ID_VNDR].writefn = vfio_raw_config_write;
 
 	if (ret)
-		vfio_pci_uninit_perm_bits();
+		vfio_pci_uninit_perm_bits_internal();
 
+out:
+	up(&vfio_perm_bits_sem);
 	return ret;
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 09/10] samples: add vfio-mdev-pci driver
  2019-11-21 11:23 [PATCH v3 00/10] vfio_pci: wrap pci device as a mediated device Liu Yi L
                   ` (7 preceding siblings ...)
  2019-11-21 11:23 ` [PATCH v3 08/10] vfio/pci: protect cap/ecap_perm bits alloc/free Liu Yi L
@ 2019-11-21 11:23 ` Liu Yi L
  2019-11-21 11:23 ` [PATCH v3 10/10] samples: refine " Liu Yi L
  2019-12-09 11:49 ` [PATCH v3 00/10] vfio_pci: wrap pci device as a mediated device Liu, Yi L
  10 siblings, 0 replies; 18+ messages in thread
From: Liu Yi L @ 2019-11-21 11:23 UTC (permalink / raw)
  To: alex.williamson, kwankhede
  Cc: kevin.tian, baolu.lu, yi.l.liu, yi.y.sun, joro,
	jean-philippe.brucker, peterx, linux-kernel, kvm,
	Masahiro Yamada

This patch adds sample driver named vfio-mdev-pci. It is to wrap
a PCI device as a mediated device. For a pci device, once bound
to vfio-mdev-pci driver, user space access of this device will
go through vfio mdev framework. The usage of the device follows
mdev management method. e.g. user should create a mdev before
exposing the device to user-space.

Benefit of this new driver would be acting as a sample driver
for recent changes from "vfio/mdev: IOMMU aware mediated device"
patchset. Also it could be a good experiment driver for future
device specific mdev migration support. This sample driver only
supports singleton iommu groups, for non-singleton iommu groups,
this sample driver doesn't work. It will fail when trying to assign
the non-singleton iommu group to VMs.

To use this driver:
a) build and load vfio-mdev-pci.ko module
   execute "make menuconfig" and config CONFIG_SAMPLE_VFIO_MDEV_PCI
   then load it with following command
   > sudo modprobe vfio
   > sudo modprobe vfio-pci
   > sudo insmod drivers/vfio/pci/vfio-mdev-pci.ko

b) unbind original device driver
   e.g. use following command to unbind its original driver
   > echo $dev_bdf > /sys/bus/pci/devices/$dev_bdf/driver/unbind

c) bind vfio-mdev-pci driver to the physical device
   > echo $vend_id $dev_id > /sys/bus/pci/drivers/vfio-mdev-pci/new_id

d) check the supported mdev instances
   > ls /sys/bus/pci/devices/$dev_bdf/mdev_supported_types/
     vfio-mdev-pci-type_name
   > ls /sys/bus/pci/devices/$dev_bdf/mdev_supported_types/\
     vfio-mdev-pci-type_name/
     available_instances  create  device_api  devices  name

e)  create mdev on this physical device (only 1 instance)
   > echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1003" > \
     /sys/bus/pci/devices/$dev_bdf/mdev_supported_types/\
     vfio-mdev-pci-type_name/create

f) passthru the mdev to guest
   add the following line in QEMU boot command
   -device vfio-pci,\
    sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1003

g) destroy mdev
   > echo 1 > /sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1003/\
     remove

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 drivers/vfio/pci/Makefile        |   6 +
 drivers/vfio/pci/vfio_mdev_pci.c | 407 +++++++++++++++++++++++++++++++++++++++
 samples/Kconfig                  |  11 ++
 3 files changed, 424 insertions(+)
 create mode 100644 drivers/vfio/pci/vfio_mdev_pci.c

diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index d94317a..ac118ef 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -5,4 +5,10 @@ vfio-pci-y := vfio_pci.o vfio_pci_common.o vfio_pci_intrs.o \
 vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
 vfio-pci-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2.o
 
+vfio-mdev-pci-y := vfio_mdev_pci.o vfio_pci_common.o vfio_pci_intrs.o \
+			vfio_pci_rdwr.o vfio_pci_config.o
+vfio-mdev-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
+vfio-mdev-pci-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2.o
+
 obj-$(CONFIG_VFIO_PCI) += vfio-pci.o
+obj-$(CONFIG_SAMPLE_VFIO_MDEV_PCI) += vfio-mdev-pci.o
diff --git a/drivers/vfio/pci/vfio_mdev_pci.c b/drivers/vfio/pci/vfio_mdev_pci.c
new file mode 100644
index 0000000..1eb4ddc
--- /dev/null
+++ b/drivers/vfio/pci/vfio_mdev_pci.c
@@ -0,0 +1,407 @@
+/*
+ * Copyright © 2019 Intel Corporation.
+ *     Author: Liu Yi L <yi.l.liu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Derived from original vfio_pci.c:
+ * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
+ *     Author: Alex Williamson <alex.williamson@redhat.com>
+ *
+ * Derived from original vfio:
+ * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
+ * Author: Tom Lyon, pugs@cisco.com
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/device.h>
+#include <linux/eventfd.h>
+#include <linux/file.h>
+#include <linux/interrupt.h>
+#include <linux/iommu.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/pci.h>
+#include <linux/pm_runtime.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <linux/uaccess.h>
+#include <linux/vfio.h>
+#include <linux/vgaarb.h>
+#include <linux/nospec.h>
+#include <linux/mdev.h>
+
+#include "vfio_pci_private.h"
+
+#define DRIVER_VERSION  "0.1"
+#define DRIVER_AUTHOR   "Liu Yi L <yi.l.liu@intel.com>"
+#define DRIVER_DESC     "VFIO Mdev PCI - Sample driver for PCI device as a mdev"
+
+#define VFIO_MDEV_PCI_NAME  "vfio-mdev-pci"
+
+static char ids[1024] __initdata;
+module_param_string(ids, ids, sizeof(ids), 0);
+MODULE_PARM_DESC(ids, "Initial PCI IDs to add to the vfio-mdev-pci driver, format is \"vendor:device[:subvendor[:subdevice[:class[:class_mask]]]]\" and multiple comma separated entries can be specified");
+
+static bool nointxmask;
+module_param_named(nointxmask, nointxmask, bool, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(nointxmask,
+		  "Disable support for PCI 2.3 style INTx masking.  If this resolves problems for specific devices, report lspci -vvvxxx to linux-pci@vger.kernel.org so the device can be fixed automatically via the broken_intx_masking flag.");
+
+#ifdef CONFIG_VFIO_PCI_VGA
+static bool disable_vga;
+module_param(disable_vga, bool, S_IRUGO);
+MODULE_PARM_DESC(disable_vga, "Disable VGA resource access through vfio-mdev-pci");
+#endif
+
+static bool disable_idle_d3;
+module_param(disable_idle_d3, bool, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(disable_idle_d3,
+		 "Disable using the PCI D3 low power state for idle, unused devices");
+
+static struct pci_driver vfio_mdev_pci_driver;
+
+static ssize_t
+name_show(struct kobject *kobj, struct device *dev, char *buf)
+{
+	return sprintf(buf, "%s-type1\n", dev_name(dev));
+}
+
+MDEV_TYPE_ATTR_RO(name);
+
+static ssize_t
+available_instances_show(struct kobject *kobj, struct device *dev, char *buf)
+{
+	return sprintf(buf, "%d\n", 1);
+}
+
+MDEV_TYPE_ATTR_RO(available_instances);
+
+static ssize_t device_api_show(struct kobject *kobj, struct device *dev,
+		char *buf)
+{
+	return sprintf(buf, "%s\n", VFIO_DEVICE_API_PCI_STRING);
+}
+
+MDEV_TYPE_ATTR_RO(device_api);
+
+static struct attribute *vfio_mdev_pci_types_attrs[] = {
+	&mdev_type_attr_name.attr,
+	&mdev_type_attr_device_api.attr,
+	&mdev_type_attr_available_instances.attr,
+	NULL,
+};
+
+static struct attribute_group vfio_mdev_pci_type_group1 = {
+	.name  = "type1",
+	.attrs = vfio_mdev_pci_types_attrs,
+};
+
+struct attribute_group *vfio_mdev_pci_type_groups[] = {
+	&vfio_mdev_pci_type_group1,
+	NULL,
+};
+
+struct vfio_mdev_pci {
+	struct vfio_pci_device *vdev;
+	struct mdev_device *mdev;
+	unsigned long handle;
+};
+
+static int vfio_mdev_pci_create(struct kobject *kobj, struct mdev_device *mdev)
+{
+	struct device *pdev;
+	struct vfio_pci_device *vdev;
+	struct vfio_mdev_pci *pmdev;
+	int ret;
+
+	pdev = mdev_parent_dev(mdev);
+	vdev = dev_get_drvdata(pdev);
+	pmdev = kzalloc(sizeof(struct vfio_mdev_pci), GFP_KERNEL);
+	if (pmdev == NULL) {
+		ret = -EBUSY;
+		goto out;
+	}
+
+	pmdev->mdev = mdev;
+	pmdev->vdev = vdev;
+	mdev_set_drvdata(mdev, pmdev);
+	ret = mdev_set_iommu_device(mdev_dev(mdev), pdev);
+	if (ret) {
+		pr_info("%s, failed to config iommu isolation for mdev: %s on pf: %s\n",
+			__func__, dev_name(mdev_dev(mdev)), dev_name(pdev));
+		goto out;
+	}
+
+	pr_info("%s, creation succeeded for mdev: %s\n", __func__,
+		     dev_name(mdev_dev(mdev)));
+out:
+	return ret;
+}
+
+static int vfio_mdev_pci_remove(struct mdev_device *mdev)
+{
+	struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev);
+
+	kfree(pmdev);
+	pr_info("%s, succeeded for mdev: %s\n", __func__,
+		     dev_name(mdev_dev(mdev)));
+
+	return 0;
+}
+
+static int vfio_mdev_pci_open(struct mdev_device *mdev)
+{
+	struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev);
+	struct vfio_pci_device *vdev = pmdev->vdev;
+	int ret = 0;
+
+	if (!try_module_get(THIS_MODULE))
+		return -ENODEV;
+
+	vfio_pci_refresh_config(vdev, nointxmask, disable_idle_d3);
+
+	mutex_lock(&vdev->reflck->lock);
+
+	if (!vdev->refcnt) {
+		ret = vfio_pci_enable(vdev);
+		if (ret)
+			goto error;
+
+		vfio_spapr_pci_eeh_open(vdev->pdev);
+	}
+	vdev->refcnt++;
+error:
+	mutex_unlock(&vdev->reflck->lock);
+	if (!ret)
+		pr_info("Succeeded to open mdev: %s on pf: %s\n",
+		dev_name(mdev_dev(mdev)), dev_name(&pmdev->vdev->pdev->dev));
+	else {
+		pr_info("Failed to open mdev: %s on pf: %s\n",
+		dev_name(mdev_dev(mdev)), dev_name(&pmdev->vdev->pdev->dev));
+		module_put(THIS_MODULE);
+	}
+	return ret;
+}
+
+static void vfio_mdev_pci_release(struct mdev_device *mdev)
+{
+	struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev);
+	struct vfio_pci_device *vdev = pmdev->vdev;
+
+	pr_info("Release mdev: %s on pf: %s\n",
+		dev_name(mdev_dev(mdev)), dev_name(&pmdev->vdev->pdev->dev));
+
+	mutex_lock(&vdev->reflck->lock);
+
+	if (!(--vdev->refcnt)) {
+		vfio_spapr_pci_eeh_release(vdev->pdev);
+		vfio_pci_disable(vdev);
+	}
+
+	mutex_unlock(&vdev->reflck->lock);
+
+	module_put(THIS_MODULE);
+}
+
+static long vfio_mdev_pci_ioctl(struct mdev_device *mdev, unsigned int cmd,
+			     unsigned long arg)
+{
+	struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev);
+
+	return vfio_pci_ioctl(pmdev->vdev, cmd, arg);
+}
+
+static int vfio_mdev_pci_mmap(struct mdev_device *mdev,
+				struct vm_area_struct *vma)
+{
+	struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev);
+
+	return vfio_pci_mmap(pmdev->vdev, vma);
+}
+
+static ssize_t vfio_mdev_pci_read(struct mdev_device *mdev, char __user *buf,
+			size_t count, loff_t *ppos)
+{
+	struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev);
+
+	return vfio_pci_read(pmdev->vdev, buf, count, ppos);
+}
+
+static ssize_t vfio_mdev_pci_write(struct mdev_device *mdev,
+				const char __user *buf,
+				size_t count, loff_t *ppos)
+{
+	struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev);
+
+	return vfio_pci_write(pmdev->vdev, (char __user *)buf, count, ppos);
+}
+
+static const struct mdev_parent_ops vfio_mdev_pci_ops = {
+	.supported_type_groups	= vfio_mdev_pci_type_groups,
+	.create			= vfio_mdev_pci_create,
+	.remove			= vfio_mdev_pci_remove,
+
+	.open			= vfio_mdev_pci_open,
+	.release		= vfio_mdev_pci_release,
+
+	.read			= vfio_mdev_pci_read,
+	.write			= vfio_mdev_pci_write,
+	.mmap			= vfio_mdev_pci_mmap,
+	.ioctl			= vfio_mdev_pci_ioctl,
+};
+
+static int vfio_mdev_pci_driver_probe(struct pci_dev *pdev,
+				       const struct pci_device_id *id)
+{
+	struct vfio_pci_device *vdev;
+	int ret;
+
+	if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
+		return -EINVAL;
+
+	/*
+	 * Prevent binding to PFs with VFs enabled, this too easily allows
+	 * userspace instance with VFs and PFs from the same device, which
+	 * cannot work.  Disabling SR-IOV here would initiate removing the
+	 * VFs, which would unbind the driver, which is prone to blocking
+	 * if that VF is also in use by vfio-pci or vfio-mdev-pci. Just
+	 * reject these PFs and let the user sort it out.
+	 */
+	if (pci_num_vf(pdev)) {
+		pci_warn(pdev, "Cannot bind to PF with SR-IOV enabled\n");
+		return -EBUSY;
+	}
+
+	vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
+	if (!vdev)
+		return -ENOMEM;
+
+	vdev->pdev = pdev;
+	vdev->irq_type = VFIO_PCI_NUM_IRQS;
+	mutex_init(&vdev->igate);
+	spin_lock_init(&vdev->irqlock);
+	mutex_init(&vdev->ioeventfds_lock);
+	INIT_LIST_HEAD(&vdev->ioeventfds_list);
+	vdev->nointxmask = nointxmask;
+#ifdef CONFIG_VFIO_PCI_VGA
+	vdev->disable_vga = disable_vga;
+#endif
+	vdev->disable_idle_d3 = disable_idle_d3;
+
+	pci_set_drvdata(pdev, vdev);
+
+	ret = vfio_pci_reflck_attach(vdev);
+	if (ret) {
+		pci_set_drvdata(pdev, NULL);
+		kfree(vdev);
+		return ret;
+	}
+
+	if (vfio_pci_is_vga(pdev)) {
+		vga_client_register(pdev, vdev, NULL, vfio_pci_set_vga_decode);
+		vga_set_legacy_decoding(pdev,
+					vfio_pci_set_vga_decode(vdev, false));
+	}
+
+	vfio_pci_probe_power_state(vdev);
+
+	if (!vdev->disable_idle_d3) {
+		/*
+		 * pci-core sets the device power state to an unknown value at
+		 * bootup and after being removed from a driver.  The only
+		 * transition it allows from this unknown state is to D0, which
+		 * typically happens when a driver calls pci_enable_device().
+		 * We're not ready to enable the device yet, but we do want to
+		 * be able to get to D3.  Therefore first do a D0 transition
+		 * before going to D3.
+		 */
+		vfio_pci_set_power_state(vdev, PCI_D0);
+		vfio_pci_set_power_state(vdev, PCI_D3hot);
+	}
+
+	ret = mdev_register_device(&pdev->dev, &vfio_mdev_pci_ops);
+	if (ret)
+		pr_err("Cannot register mdev for device %s\n",
+			dev_name(&pdev->dev));
+	else
+		pr_info("Wrap device %s as a mdev\n", dev_name(&pdev->dev));
+
+	return ret;
+}
+
+static void vfio_mdev_pci_driver_remove(struct pci_dev *pdev)
+{
+	struct vfio_pci_device *vdev;
+
+	vdev = pci_get_drvdata(pdev);
+	if (!vdev)
+		return;
+
+	vfio_pci_reflck_put(vdev->reflck);
+
+	kfree(vdev->region);
+	mutex_destroy(&vdev->ioeventfds_lock);
+
+	if (!disable_idle_d3)
+		vfio_pci_set_power_state(vdev, PCI_D0);
+
+	kfree(vdev->pm_save);
+
+	if (vfio_pci_is_vga(pdev)) {
+		vga_client_register(pdev, NULL, NULL, NULL);
+		vga_set_legacy_decoding(pdev,
+				VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM |
+				VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM);
+	}
+
+	kfree(vdev);
+}
+
+static struct pci_driver vfio_mdev_pci_driver = {
+	.name		= VFIO_MDEV_PCI_NAME,
+	.id_table	= NULL, /* only dynamic ids */
+	.probe		= vfio_mdev_pci_driver_probe,
+	.remove		= vfio_mdev_pci_driver_remove,
+	.err_handler	= &vfio_err_handlers,
+};
+
+static void __exit vfio_mdev_pci_cleanup(void)
+{
+	pci_unregister_driver(&vfio_mdev_pci_driver);
+	vfio_pci_uninit_perm_bits();
+}
+
+static int __init vfio_mdev_pci_init(void)
+{
+	int ret;
+
+	/* Allocate shared config space permision data used by all devices */
+	ret = vfio_pci_init_perm_bits();
+	if (ret)
+		return ret;
+
+	/* Register and scan for devices */
+	ret = pci_register_driver(&vfio_mdev_pci_driver);
+	if (ret)
+		goto out_driver;
+
+	vfio_pci_fill_ids(ids, &vfio_mdev_pci_driver);
+
+	return 0;
+out_driver:
+	vfio_pci_uninit_perm_bits();
+	return ret;
+}
+
+module_init(vfio_mdev_pci_init);
+module_exit(vfio_mdev_pci_cleanup);
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
diff --git a/samples/Kconfig b/samples/Kconfig
index c8dacb4..1513fef 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -169,4 +169,15 @@ config SAMPLE_VFS
 	  as mount API and statx().  Note that this is restricted to the x86
 	  arch whilst it accesses system calls that aren't yet in all arches.
 
+config SAMPLE_VFIO_MDEV_PCI
+	tristate "Sample driver for wrapping PCI device as a mdev"
+	depends on PCI && EVENTFD && VFIO_MDEV_DEVICE
+	select VFIO_VIRQFD
+	select IRQ_BYPASS_MANAGER
+	help
+	  Sample driver for wrapping a PCI device as a mdev. Once bound to
+	  this driver, device passthru should through mdev path.
+
+	  If you don't know what to do here, say N.
+
 endif # SAMPLES
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 10/10] samples: refine vfio-mdev-pci driver
  2019-11-21 11:23 [PATCH v3 00/10] vfio_pci: wrap pci device as a mediated device Liu Yi L
                   ` (8 preceding siblings ...)
  2019-11-21 11:23 ` [PATCH v3 09/10] samples: add vfio-mdev-pci driver Liu Yi L
@ 2019-11-21 11:23 ` Liu Yi L
  2019-12-09 11:49 ` [PATCH v3 00/10] vfio_pci: wrap pci device as a mediated device Liu, Yi L
  10 siblings, 0 replies; 18+ messages in thread
From: Liu Yi L @ 2019-11-21 11:23 UTC (permalink / raw)
  To: alex.williamson, kwankhede
  Cc: kevin.tian, baolu.lu, yi.l.liu, yi.y.sun, joro,
	jean-philippe.brucker, peterx, linux-kernel, kvm

From: Alex Williamson <alex.williamson@redhat.com>

This patch refines the implementation of original vfio-mdev-pci driver.

And the vfio-mdev-pci-type_name will be named per the following rule:

	vmdev->attr.name = kasprintf(GFP_KERNEL,
				     "%04x:%04x:%04x:%04x:%06x:%02x",
				     pdev->vendor, pdev->device,
				     pdev->subsystem_vendor,
				     pdev->subsystem_device, pdev->class,
				     pdev->revision);

Before usage, check the /sys/bus/pci/devices/$bdf/mdev_supported_types/
to ensure the final mdev_supported_types for the specific device.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 drivers/vfio/pci/vfio_mdev_pci.c | 123 +++++++++++++++++++++++----------------
 1 file changed, 73 insertions(+), 50 deletions(-)

diff --git a/drivers/vfio/pci/vfio_mdev_pci.c b/drivers/vfio/pci/vfio_mdev_pci.c
index 1eb4ddc..a1f62cf 100644
--- a/drivers/vfio/pci/vfio_mdev_pci.c
+++ b/drivers/vfio/pci/vfio_mdev_pci.c
@@ -65,18 +65,22 @@ MODULE_PARM_DESC(disable_idle_d3,
 
 static struct pci_driver vfio_mdev_pci_driver;
 
-static ssize_t
-name_show(struct kobject *kobj, struct device *dev, char *buf)
-{
-	return sprintf(buf, "%s-type1\n", dev_name(dev));
-}
-
-MDEV_TYPE_ATTR_RO(name);
+struct vfio_mdev_pci_device {
+	struct vfio_pci_device vdev;
+	struct mdev_parent_ops ops;
+	struct attribute_group *groups[2];
+	struct attribute_group attr;
+	atomic_t avail;
+};
 
 static ssize_t
 available_instances_show(struct kobject *kobj, struct device *dev, char *buf)
 {
-	return sprintf(buf, "%d\n", 1);
+	struct vfio_mdev_pci_device *vmdev;
+
+	vmdev = pci_get_drvdata(to_pci_dev(dev));
+
+	return sprintf(buf, "%d\n", atomic_read(&vmdev->avail));
 }
 
 MDEV_TYPE_ATTR_RO(available_instances);
@@ -90,64 +94,61 @@ static ssize_t device_api_show(struct kobject *kobj, struct device *dev,
 MDEV_TYPE_ATTR_RO(device_api);
 
 static struct attribute *vfio_mdev_pci_types_attrs[] = {
-	&mdev_type_attr_name.attr,
 	&mdev_type_attr_device_api.attr,
 	&mdev_type_attr_available_instances.attr,
 	NULL,
 };
 
-static struct attribute_group vfio_mdev_pci_type_group1 = {
-	.name  = "type1",
-	.attrs = vfio_mdev_pci_types_attrs,
-};
-
-struct attribute_group *vfio_mdev_pci_type_groups[] = {
-	&vfio_mdev_pci_type_group1,
-	NULL,
-};
-
 struct vfio_mdev_pci {
 	struct vfio_pci_device *vdev;
 	struct mdev_device *mdev;
-	unsigned long handle;
 };
 
 static int vfio_mdev_pci_create(struct kobject *kobj, struct mdev_device *mdev)
 {
 	struct device *pdev;
-	struct vfio_pci_device *vdev;
+	struct vfio_mdev_pci_device *vmdev;
 	struct vfio_mdev_pci *pmdev;
 	int ret;
 
 	pdev = mdev_parent_dev(mdev);
-	vdev = dev_get_drvdata(pdev);
+	vmdev = dev_get_drvdata(pdev);
+
+	if (atomic_dec_if_positive(&vmdev->avail) < 0)
+		return -ENOSPC;
+
 	pmdev = kzalloc(sizeof(struct vfio_mdev_pci), GFP_KERNEL);
-	if (pmdev == NULL) {
-		ret = -EBUSY;
-		goto out;
+	if (!pmdev) {
+		atomic_inc(&vmdev->avail);
+		return -ENOMEM;
 	}
 
 	pmdev->mdev = mdev;
-	pmdev->vdev = vdev;
+	pmdev->vdev = &vmdev->vdev;
 	mdev_set_drvdata(mdev, pmdev);
 	ret = mdev_set_iommu_device(mdev_dev(mdev), pdev);
 	if (ret) {
 		pr_info("%s, failed to config iommu isolation for mdev: %s on pf: %s\n",
 			__func__, dev_name(mdev_dev(mdev)), dev_name(pdev));
-		goto out;
+		kfree(pmdev);
+		atomic_inc(&vmdev->avail);
+		return ret;
 	}
 
 	pr_info("%s, creation succeeded for mdev: %s\n", __func__,
 		     dev_name(mdev_dev(mdev)));
-out:
-	return ret;
+	return 0;
 }
 
 static int vfio_mdev_pci_remove(struct mdev_device *mdev)
 {
 	struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev);
+	struct vfio_mdev_pci_device *vmdev;
+
+	vmdev = container_of(pmdev->vdev, struct vfio_mdev_pci_device, vdev);
 
 	kfree(pmdev);
+	atomic_inc(&vmdev->avail);
 	pr_info("%s, succeeded for mdev: %s\n", __func__,
 		     dev_name(mdev_dev(mdev)));
 
@@ -241,24 +242,12 @@ static ssize_t vfio_mdev_pci_write(struct mdev_device *mdev,
 	return vfio_pci_write(pmdev->vdev, (char __user *)buf, count, ppos);
 }
 
-static const struct mdev_parent_ops vfio_mdev_pci_ops = {
-	.supported_type_groups	= vfio_mdev_pci_type_groups,
-	.create			= vfio_mdev_pci_create,
-	.remove			= vfio_mdev_pci_remove,
-
-	.open			= vfio_mdev_pci_open,
-	.release		= vfio_mdev_pci_release,
-
-	.read			= vfio_mdev_pci_read,
-	.write			= vfio_mdev_pci_write,
-	.mmap			= vfio_mdev_pci_mmap,
-	.ioctl			= vfio_mdev_pci_ioctl,
-};
-
 static int vfio_mdev_pci_driver_probe(struct pci_dev *pdev,
 				       const struct pci_device_id *id)
 {
+	struct vfio_mdev_pci_device *vmdev;
 	struct vfio_pci_device *vdev;
+	const struct mdev_parent_ops *ops;
 	int ret;
 
 	if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
@@ -277,10 +266,38 @@ static int vfio_mdev_pci_driver_probe(struct pci_dev *pdev,
 		return -EBUSY;
 	}
 
-	vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
-	if (!vdev)
+	vmdev = kzalloc(sizeof(*vmdev), GFP_KERNEL);
+	if (!vmdev)
 		return -ENOMEM;
 
+	vmdev->attr.name = kasprintf(GFP_KERNEL,
+				     "%04x:%04x:%04x:%04x:%06x:%02x",
+				     pdev->vendor, pdev->device,
+				     pdev->subsystem_vendor,
+				     pdev->subsystem_device, pdev->class,
+				     pdev->revision);
+	if (!vmdev->attr.name) {
+		kfree(vmdev);
+		return -ENOMEM;
+	}
+
+	atomic_set(&vmdev->avail, 1);
+
+	vmdev->attr.attrs = vfio_mdev_pci_types_attrs;
+	vmdev->groups[0] = &vmdev->attr;
+
+	vmdev->ops.supported_type_groups = vmdev->groups;
+	vmdev->ops.create = vfio_mdev_pci_create;
+	vmdev->ops.remove = vfio_mdev_pci_remove;
+	vmdev->ops.open	= vfio_mdev_pci_open;
+	vmdev->ops.release = vfio_mdev_pci_release;
+	vmdev->ops.read = vfio_mdev_pci_read;
+	vmdev->ops.write = vfio_mdev_pci_write;
+	vmdev->ops.mmap = vfio_mdev_pci_mmap;
+	vmdev->ops.ioctl = vfio_mdev_pci_ioctl;
+	ops = &vmdev->ops;
+
+	vdev = &vmdev->vdev;
 	vdev->pdev = pdev;
 	vdev->irq_type = VFIO_PCI_NUM_IRQS;
 	mutex_init(&vdev->igate);
@@ -293,7 +310,7 @@ static int vfio_mdev_pci_driver_probe(struct pci_dev *pdev,
 #endif
 	vdev->disable_idle_d3 = disable_idle_d3;
 
-	pci_set_drvdata(pdev, vdev);
+	pci_set_drvdata(pdev, vmdev);
 
 	ret = vfio_pci_reflck_attach(vdev);
 	if (ret) {
@@ -324,7 +341,7 @@ static int vfio_mdev_pci_driver_probe(struct pci_dev *pdev,
 		vfio_pci_set_power_state(vdev, PCI_D3hot);
 	}
 
-	ret = mdev_register_device(&pdev->dev, &vfio_mdev_pci_ops);
+	ret = mdev_register_device(&pdev->dev, ops);
 	if (ret)
 		pr_err("Cannot register mdev for device %s\n",
 			dev_name(&pdev->dev));
@@ -336,12 +353,17 @@ static int vfio_mdev_pci_driver_probe(struct pci_dev *pdev,
 
 static void vfio_mdev_pci_driver_remove(struct pci_dev *pdev)
 {
+	struct vfio_mdev_pci_device *vmdev;
 	struct vfio_pci_device *vdev;
 
-	vdev = pci_get_drvdata(pdev);
-	if (!vdev)
+	mdev_unregister_device(&pdev->dev);
+
+	vmdev = pci_get_drvdata(pdev);
+	if (!vmdev)
 		return;
 
+	vdev = &vmdev->vdev;
+
 	vfio_pci_reflck_put(vdev->reflck);
 
 	kfree(vdev->region);
@@ -359,7 +381,8 @@ static void vfio_mdev_pci_driver_remove(struct pci_dev *pdev)
 				VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM);
 	}
 
-	kfree(vdev);
+	kfree(vmdev->attr.name);
+	kfree(vmdev);
 }
 
 static struct pci_driver vfio_mdev_pci_driver = {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* RE: [PATCH v3 00/10] vfio_pci: wrap pci device as a mediated device
  2019-11-21 11:23 [PATCH v3 00/10] vfio_pci: wrap pci device as a mediated device Liu Yi L
                   ` (9 preceding siblings ...)
  2019-11-21 11:23 ` [PATCH v3 10/10] samples: refine " Liu Yi L
@ 2019-12-09 11:49 ` Liu, Yi L
  10 siblings, 0 replies; 18+ messages in thread
From: Liu, Yi L @ 2019-12-09 11:49 UTC (permalink / raw)
  To: alex.williamson, kwankhede
  Cc: Tian, Kevin, baolu.lu, Sun, Yi Y, joro, jean-philippe.brucker,
	peterx, linux-kernel, kvm

Hi Alex,

Any comments on this version? If any, please feel free let me know.

Regards,
Yi Liu

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Thursday, November 21, 2019 7:24 PM
> To: alex.williamson@redhat.com; kwankhede@nvidia.com
> Subject: [PATCH v3 00/10] vfio_pci: wrap pci device as a mediated device
> 
> This patchset aims to add a vfio-pci-like meta driver as a demo
> user of the vfio changes introduced in "vfio/mdev: IOMMU aware
> mediated device" patchset from Baolu Lu. Besides the test purpose,
> per Alex's comments, it could also be a good base driver for
> experimenting with device specific mdev migration.
> 
> Specific interface tested in this proposal:
>  *) int mdev_set_iommu_device(struct device *dev,
>  				struct device *iommu_device)
>     introduced in the patch as below:
>     "[PATCH v5 6/8] vfio/mdev: Add iommu related member in mdev_device"
> 
> Patch Overview:
>  *) patch 1 ~ 7: code refactor for existing vfio-pci module
>                  move the common codes from vfio_pci.c to
>                  vfio_pci_common.c
>  *) patch 8: add protection to perm_bits alloc/free
>  *) patch 9: add vfio-mdev-pci sample driver
>  *) patch 10: refine the sample driver
> 
> Links:
>  *) Link of "vfio/mdev: IOMMU aware mediated device"
>          https://lwn.net/Articles/780522/
>  *) Previous versions:
>          Patch v2: https://lkml.org/lkml/2019/9/6/115
>          Patch v1: https://www.spinics.net/lists/kvm/msg188952.html
>          RFC v3: https://lkml.org/lkml/2019/4/24/495
>          RFC v2: https://lkml.org/lkml/2019/3/13/113
>          RFC v1: https://lkml.org/lkml/2019/3/4/529
>  *) may try it with the codes in below repo
>     https://github.com/luxis1999/vfio-mdev-pci-sample-driver.git : v5.4-rc7-pci-mdev
> 
> Please feel free give your comments.
> 
> Thanks,
> Yi Liu
> 
> Change log:
>   patch v2 -> patch v3:
>   - refresh the disable_idle_d3, disable_vga and nointxmask config
>     according to user config in device open.
>   - add a semaphore around the vfio-pci cap/ecap perm bits allocation/free
>   - drop the non-singleton iommu group support to keep it simple as it's
>     a sample driver for now.
> 
>   patch v1 -> patch v2:
>   - the sample driver implementation refined
>   - the sample driver can work on non-singleton iommu groups
>   - the sample driver can work with vfio-pci, devices from a non-singleton
>     group can either be bound to vfio-mdev-pci or vfio-pci, and the
>     assignment of this group still follows current vfio assignment rule.
> 
>   RFC v3 -> patch v1:
>   - split the patchset from 3 patches to 9 patches to better demonstrate
>     the changes step by step
> 
>   rfc v2->v3:
>   - use vfio-mdev-pci instead of vfio-pci-mdev
>   - place the new driver under drivers/vfio/pci while define
>     Kconfig in samples/Kconfig to clarify it is a sample driver
> 
>   rfc v1->v2:
>   - instead of adding kernel option to existing vfio-pci
>     module in v1, v2 follows Alex's suggestion to add a
>     separate vfio-pci-mdev module.
>   - new patchset subject: "vfio/pci: wrap pci device as a mediated device"
> 
> 
> Alex Williamson (1):
>   samples: refine vfio-mdev-pci driver
> 
> Liu Yi L (9):
>   vfio_pci: move vfio_pci_is_vga/vfio_vga_disabled to header
>   vfio_pci: refine user config reference in vfio-pci module
>   vfio_pci: refine vfio_pci_driver reference in vfio_pci.c
>   vfio_pci: make common functions be extern
>   vfio_pci: duplicate vfio_pci.c
>   vfio_pci: shrink vfio_pci_common.c
>   vfio_pci: shrink vfio_pci.c
>   vfio/pci: protect cap/ecap_perm bits alloc/free
>   samples: add vfio-mdev-pci driver
> 
>  drivers/vfio/pci/Makefile           |    9 +-
>  drivers/vfio/pci/vfio_mdev_pci.c    |  430 ++++++++++
>  drivers/vfio/pci/vfio_pci.c         | 1460 +---------------------------------
>  drivers/vfio/pci/vfio_pci_common.c  | 1471
> +++++++++++++++++++++++++++++++++++
>  drivers/vfio/pci/vfio_pci_config.c  |   33 +-
>  drivers/vfio/pci/vfio_pci_private.h |   39 +
>  samples/Kconfig                     |   11 +
>  7 files changed, 2000 insertions(+), 1453 deletions(-)
>  create mode 100644 drivers/vfio/pci/vfio_mdev_pci.c
>  create mode 100644 drivers/vfio/pci/vfio_pci_common.c
> 
> --
> 2.7.4


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 08/10] vfio/pci: protect cap/ecap_perm bits alloc/free
  2019-11-21 11:23 ` [PATCH v3 08/10] vfio/pci: protect cap/ecap_perm bits alloc/free Liu Yi L
@ 2019-12-15 22:46   ` Alex Williamson
  2019-12-16 11:57     ` Liu, Yi L
  0 siblings, 1 reply; 18+ messages in thread
From: Alex Williamson @ 2019-12-15 22:46 UTC (permalink / raw)
  To: Liu Yi L
  Cc: kwankhede, kevin.tian, baolu.lu, yi.y.sun, joro,
	jean-philippe.brucker, peterx, linux-kernel, kvm

On Thu, 21 Nov 2019 19:23:45 +0800
Liu Yi L <yi.l.liu@intel.com> wrote:

> This patch add a user numer track for the shared cap/ecap_perms bits,
> and the alloc/free will hold a semaphore to protect the operations.
> With the changes, first caller of vfio_pci_init_perm_bits() will
> initialize the bits. While the last caller of vfio_pci_uninit_perm_bits()
> will free the bits. This is a preparation to have multiple cap/ecap_perms
> bits users.
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Lu Baolu <baolu.lu@linux.intel.com>
> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> ---
>  drivers/vfio/pci/vfio_pci_config.c | 33 +++++++++++++++++++++++++++++++--
>  1 file changed, 31 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
> index f0891bd..274c993 100644
> --- a/drivers/vfio/pci/vfio_pci_config.c
> +++ b/drivers/vfio/pci/vfio_pci_config.c
> @@ -36,6 +36,13 @@
>  	 (offset >= PCI_ROM_ADDRESS && offset < PCI_ROM_ADDRESS + 4))
>  
>  /*
> + * vfio_perm_bits_sem: prorects the shared perm_bits alloc/free
> + * vfio_pci_perm_bits_users: tracks the user of the shared perm_bits
> + */
> +static DEFINE_SEMAPHORE(vfio_perm_bits_sem);
> +static int vfio_pci_perm_bits_users;
> +
> +/*
>   * Lengths of PCI Config Capabilities
>   *   0: Removed from the user visible capability list
>   *   FF: Variable length
> @@ -995,7 +1002,7 @@ static int __init init_pci_ext_cap_pwr_perm(struct perm_bits *perm)
>  /*
>   * Initialize the shared permission tables
>   */
> -void vfio_pci_uninit_perm_bits(void)
> +static void vfio_pci_uninit_perm_bits_internal(void)
>  {
>  	free_perm_bits(&cap_perms[PCI_CAP_ID_BASIC]);
>  
> @@ -1009,10 +1016,30 @@ void vfio_pci_uninit_perm_bits(void)
>  	free_perm_bits(&ecap_perms[PCI_EXT_CAP_ID_PWR]);
>  }
>  
> +void vfio_pci_uninit_perm_bits(void)
> +{
> +	down(&vfio_perm_bits_sem);
> +
> +	if (--vfio_pci_perm_bits_users > 0)
> +		goto out;
> +
> +	vfio_pci_uninit_perm_bits_internal();
> +
> +out:
> +	up(&vfio_perm_bits_sem);
> +}
> +
>  int __init vfio_pci_init_perm_bits(void)
>  {
>  	int ret;
>  
> +	down(&vfio_perm_bits_sem);
> +
> +	if (++vfio_pci_perm_bits_users > 1) {
> +		ret = 0;
> +		goto out;
> +	}
> +
>  	/* Basic config space */
>  	ret = init_pci_cap_basic_perm(&cap_perms[PCI_CAP_ID_BASIC]);
>  
> @@ -1030,8 +1057,10 @@ int __init vfio_pci_init_perm_bits(void)
>  	ecap_perms[PCI_EXT_CAP_ID_VNDR].writefn = vfio_raw_config_write;
>  
>  	if (ret)
> -		vfio_pci_uninit_perm_bits();
> +		vfio_pci_uninit_perm_bits_internal();
>  
> +out:
> +	up(&vfio_perm_bits_sem);
>  	return ret;
>  }
>  

Hi Yi,

Sorry for slowness in providing feedback on this series.  If we
provided a vfio-pci-common module that vfio-pci and vfio-mdev-pci
depend on, doesn't this entire problem go away?  I played a little bit
with this in the crude patch below, it seems to work.  To finish this,
I think we'd move the function declarations out of the "private" header
file and into one under include/linux, then we could also move
vfio_mdev_pci.c to the samples directory like we intended originally.
I know you had tried to link things from samples and it didn't work,
but is the below a better attempt at resolving this?  It commits us to
exporting a bunch of functions, we'll need to decide whether that's a
good idea.  Thanks,

Alex

(patch applies to 5.4 + this series, thus includes a revert of this patch)

diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index ac3c1dd3edef..1a1fb3b7fd46 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -1,9 +1,14 @@
 # SPDX-License-Identifier: GPL-2.0-only
-config VFIO_PCI
-	tristate "VFIO support for PCI devices"
+
+config VFIO_PCI_COMMON
 	depends on VFIO && PCI && EVENTFD
 	select VFIO_VIRQFD
 	select IRQ_BYPASS_MANAGER
+	tristate
+
+config VFIO_PCI
+	tristate "VFIO support for PCI devices"
+	select VFIO_PCI_COMMON
 	help
 	  Support for the PCI VFIO bus driver.  This is required to make
 	  use of PCI drivers using the VFIO framework.
diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index ac118ef246a4..9f599cb6e207 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -1,14 +1,13 @@
 # SPDX-License-Identifier: GPL-2.0-only
 
-vfio-pci-y := vfio_pci.o vfio_pci_common.o vfio_pci_intrs.o \
+vfio-pci-common-y := vfio_pci_common.o vfio_pci_intrs.o \
 		vfio_pci_rdwr.o vfio_pci_config.o
-vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
-vfio-pci-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2.o
+vfio-pci-common-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
+vfio-pci-common-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2.o
 
-vfio-mdev-pci-y := vfio_mdev_pci.o vfio_pci_common.o vfio_pci_intrs.o \
-			vfio_pci_rdwr.o vfio_pci_config.o
-vfio-mdev-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
-vfio-mdev-pci-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2.o
+vfio-pci-y := vfio_pci.o
+vfio-mdev-pci-y := vfio_mdev_pci.o
 
+obj-$(CONFIG_VFIO_PCI_COMMON) += vfio-pci-common.o
 obj-$(CONFIG_VFIO_PCI) += vfio-pci.o
 obj-$(CONFIG_SAMPLE_VFIO_MDEV_PCI) += vfio-mdev-pci.o
diff --git a/drivers/vfio/pci/vfio_mdev_pci.c b/drivers/vfio/pci/vfio_mdev_pci.c
index a1f62cf30d08..e6b070ff84ac 100644
--- a/drivers/vfio/pci/vfio_mdev_pci.c
+++ b/drivers/vfio/pci/vfio_mdev_pci.c
@@ -390,35 +390,26 @@ static struct pci_driver vfio_mdev_pci_driver = {
 	.id_table	= NULL, /* only dynamic ids */
 	.probe		= vfio_mdev_pci_driver_probe,
 	.remove		= vfio_mdev_pci_driver_remove,
-	.err_handler	= &vfio_err_handlers,
+	.err_handler	= &vfio_pci_err_handlers,
 };
 
 static void __exit vfio_mdev_pci_cleanup(void)
 {
 	pci_unregister_driver(&vfio_mdev_pci_driver);
-	vfio_pci_uninit_perm_bits();
 }
 
 static int __init vfio_mdev_pci_init(void)
 {
 	int ret;
 
-	/* Allocate shared config space permision data used by all devices */
-	ret = vfio_pci_init_perm_bits();
-	if (ret)
-		return ret;
-
 	/* Register and scan for devices */
 	ret = pci_register_driver(&vfio_mdev_pci_driver);
 	if (ret)
-		goto out_driver;
+		return ret;
 
 	vfio_pci_fill_ids(ids, &vfio_mdev_pci_driver);
 
 	return 0;
-out_driver:
-	vfio_pci_uninit_perm_bits();
-	return ret;
 }
 
 module_init(vfio_mdev_pci_init);
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 7e24da25f176..704766714c11 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -225,36 +225,26 @@ static struct pci_driver vfio_pci_driver = {
 	.id_table	= NULL, /* only dynamic ids */
 	.probe		= vfio_pci_probe,
 	.remove		= vfio_pci_remove,
-	.err_handler	= &vfio_err_handlers,
+	.err_handler	= &vfio_pci_err_handlers,
 };
 
 static void __exit vfio_pci_cleanup(void)
 {
 	pci_unregister_driver(&vfio_pci_driver);
-	vfio_pci_uninit_perm_bits();
 }
 
 static int __init vfio_pci_init(void)
 {
 	int ret;
 
-	/* Allocate shared config space permision data used by all devices */
-	ret = vfio_pci_init_perm_bits();
-	if (ret)
-		return ret;
-
 	/* Register and scan for devices */
 	ret = pci_register_driver(&vfio_pci_driver);
 	if (ret)
-		goto out_driver;
+		return ret;
 
 	vfio_pci_fill_ids(ids, &vfio_pci_driver);
 
 	return 0;
-
-out_driver:
-	vfio_pci_uninit_perm_bits();
-	return ret;
 }
 
 module_init(vfio_pci_init);
diff --git a/drivers/vfio/pci/vfio_pci_common.c b/drivers/vfio/pci/vfio_pci_common.c
index 42b46d040cb4..dd6f8fd208ce 100644
--- a/drivers/vfio/pci/vfio_pci_common.c
+++ b/drivers/vfio/pci/vfio_pci_common.c
@@ -30,6 +30,10 @@
 
 #include "vfio_pci_private.h"
 
+#define DRIVER_VERSION  "0.2"
+#define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
+#define DRIVER_DESC     "VFIO PCI Common"
+
 /*
  * Our VGA arbiter participation is limited since we don't know anything
  * about the device itself.  However, if the device is the only VGA device
@@ -69,6 +73,7 @@ unsigned int vfio_pci_set_vga_decode(void *opaque, bool single_vga)
 
 	return decodes;
 }
+EXPORT_SYMBOL_GPL(vfio_pci_set_vga_decode);
 
 static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev)
 {
@@ -181,6 +186,7 @@ void vfio_pci_probe_power_state(struct vfio_pci_device *vdev)
 
 	vdev->needs_pm_restore = !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET);
 }
+EXPORT_SYMBOL_GPL(vfio_pci_probe_power_state);
 
 /*
  * pci_set_power_state() wrapper handling devices which perform a soft reset on
@@ -219,6 +225,7 @@ int vfio_pci_set_power_state(struct vfio_pci_device *vdev, pci_power_t state)
 
 	return ret;
 }
+EXPORT_SYMBOL_GPL(vfio_pci_set_power_state);
 
 int vfio_pci_enable(struct vfio_pci_device *vdev)
 {
@@ -326,6 +333,7 @@ int vfio_pci_enable(struct vfio_pci_device *vdev)
 	vfio_pci_disable(vdev);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(vfio_pci_enable);
 
 void vfio_pci_disable(struct vfio_pci_device *vdev)
 {
@@ -424,6 +432,7 @@ void vfio_pci_disable(struct vfio_pci_device *vdev)
 	if (!vdev->disable_idle_d3)
 		vfio_pci_set_power_state(vdev, PCI_D3hot);
 }
+EXPORT_SYMBOL_GPL(vfio_pci_disable);
 
 void vfio_pci_refresh_config(struct vfio_pci_device *vdev,
 			bool nointxmask, bool disable_idle_d3)
@@ -431,6 +440,7 @@ void vfio_pci_refresh_config(struct vfio_pci_device *vdev,
 	vdev->nointxmask = nointxmask;
 	vdev->disable_idle_d3 = disable_idle_d3;
 }
+EXPORT_SYMBOL_GPL(vfio_pci_refresh_config);
 
 static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type)
 {
@@ -615,6 +625,7 @@ int vfio_pci_register_dev_region(struct vfio_pci_device *vdev,
 
 	return 0;
 }
+EXPORT_SYMBOL_GPL(vfio_pci_register_dev_region);
 
 long vfio_pci_ioctl(void *device_data,
 		   unsigned int cmd, unsigned long arg)
@@ -1069,6 +1080,7 @@ long vfio_pci_ioctl(void *device_data,
 
 	return -ENOTTY;
 }
+EXPORT_SYMBOL_GPL(vfio_pci_ioctl);
 
 static ssize_t vfio_pci_rw(void *device_data, char __user *buf,
 			   size_t count, loff_t *ppos, bool iswrite)
@@ -1110,6 +1122,7 @@ ssize_t vfio_pci_read(void *device_data, char __user *buf,
 
 	return vfio_pci_rw(device_data, buf, count, ppos, false);
 }
+EXPORT_SYMBOL_GPL(vfio_pci_read);
 
 ssize_t vfio_pci_write(void *device_data, const char __user *buf,
 			      size_t count, loff_t *ppos)
@@ -1119,6 +1132,7 @@ ssize_t vfio_pci_write(void *device_data, const char __user *buf,
 
 	return vfio_pci_rw(device_data, (char __user *)buf, count, ppos, true);
 }
+EXPORT_SYMBOL_GPL(vfio_pci_write);
 
 int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
 {
@@ -1181,6 +1195,7 @@ int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
 	return remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
 			       req_len, vma->vm_page_prot);
 }
+EXPORT_SYMBOL_GPL(vfio_pci_mmap);
 
 void vfio_pci_request(void *device_data, unsigned int count)
 {
@@ -1202,6 +1217,7 @@ void vfio_pci_request(void *device_data, unsigned int count)
 
 	mutex_unlock(&vdev->igate);
 }
+EXPORT_SYMBOL_GPL(vfio_pci_request);
 
 static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
 						  pci_channel_state_t state)
@@ -1231,9 +1247,10 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
 	return PCI_ERS_RESULT_CAN_RECOVER;
 }
 
-const struct pci_error_handlers vfio_err_handlers = {
+const struct pci_error_handlers vfio_pci_err_handlers = {
 	.error_detected = vfio_pci_aer_err_detected,
 };
+EXPORT_SYMBOL_GPL(vfio_pci_err_handlers);
 
 static DEFINE_MUTEX(reflck_lock);
 
@@ -1300,6 +1317,7 @@ int vfio_pci_reflck_attach(struct vfio_pci_device *vdev)
 
 	return PTR_ERR_OR_ZERO(vdev->reflck);
 }
+EXPORT_SYMBOL_GPL(vfio_pci_reflck_attach);
 
 static void vfio_pci_reflck_release(struct kref *kref)
 {
@@ -1315,6 +1333,7 @@ void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck)
 {
 	kref_put_mutex(&reflck->kref, vfio_pci_reflck_release, &reflck_lock);
 }
+EXPORT_SYMBOL_GPL(vfio_pci_reflck_put);
 
 struct vfio_devices {
 	struct vfio_device **devices;
@@ -1429,7 +1448,7 @@ static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev)
 	kfree(devs.devices);
 }
 
-void __init vfio_pci_fill_ids(char *ids, struct pci_driver *driver)
+void vfio_pci_fill_ids(char *ids, struct pci_driver *driver)
 {
 	char *p, *id;
 	int rc;
@@ -1469,3 +1488,22 @@ void __init vfio_pci_fill_ids(char *ids, struct pci_driver *driver)
 				class, class_mask);
 	}
 }
+EXPORT_SYMBOL_GPL(vfio_pci_fill_ids);
+
+static int __init vfio_pci_common_init(void)
+{
+	/* Allocate shared config space permision data used by all devices */
+	return vfio_pci_init_perm_bits();
+}
+module_init(vfio_pci_common_init);
+
+static void __exit vfio_pci_common_exit(void)
+{
+	vfio_pci_uninit_perm_bits();
+}
+module_exit(vfio_pci_common_exit);
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index 274c99311edc..f0891bd8444c 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -35,13 +35,6 @@
 	((offset >= PCI_BASE_ADDRESS_0 && offset < PCI_BASE_ADDRESS_5 + 4) || \
 	 (offset >= PCI_ROM_ADDRESS && offset < PCI_ROM_ADDRESS + 4))
 
-/*
- * vfio_perm_bits_sem: prorects the shared perm_bits alloc/free
- * vfio_pci_perm_bits_users: tracks the user of the shared perm_bits
- */
-static DEFINE_SEMAPHORE(vfio_perm_bits_sem);
-static int vfio_pci_perm_bits_users;
-
 /*
  * Lengths of PCI Config Capabilities
  *   0: Removed from the user visible capability list
@@ -1002,7 +995,7 @@ static int __init init_pci_ext_cap_pwr_perm(struct perm_bits *perm)
 /*
  * Initialize the shared permission tables
  */
-static void vfio_pci_uninit_perm_bits_internal(void)
+void vfio_pci_uninit_perm_bits(void)
 {
 	free_perm_bits(&cap_perms[PCI_CAP_ID_BASIC]);
 
@@ -1016,30 +1009,10 @@ static void vfio_pci_uninit_perm_bits_internal(void)
 	free_perm_bits(&ecap_perms[PCI_EXT_CAP_ID_PWR]);
 }
 
-void vfio_pci_uninit_perm_bits(void)
-{
-	down(&vfio_perm_bits_sem);
-
-	if (--vfio_pci_perm_bits_users > 0)
-		goto out;
-
-	vfio_pci_uninit_perm_bits_internal();
-
-out:
-	up(&vfio_perm_bits_sem);
-}
-
 int __init vfio_pci_init_perm_bits(void)
 {
 	int ret;
 
-	down(&vfio_perm_bits_sem);
-
-	if (++vfio_pci_perm_bits_users > 1) {
-		ret = 0;
-		goto out;
-	}
-
 	/* Basic config space */
 	ret = init_pci_cap_basic_perm(&cap_perms[PCI_CAP_ID_BASIC]);
 
@@ -1057,10 +1030,8 @@ int __init vfio_pci_init_perm_bits(void)
 	ecap_perms[PCI_EXT_CAP_ID_VNDR].writefn = vfio_raw_config_write;
 
 	if (ret)
-		vfio_pci_uninit_perm_bits_internal();
+		vfio_pci_uninit_perm_bits();
 
-out:
-	up(&vfio_perm_bits_sem);
 	return ret;
 }
 
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index 5d1739460559..562b7c1c06f7 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -135,7 +135,7 @@ struct vfio_pci_device {
 #define is_irq_none(vdev) (!(is_intx(vdev) || is_msi(vdev) || is_msix(vdev)))
 #define irq_is(vdev, type) (vdev->irq_type == type)
 
-extern const struct pci_error_handlers vfio_err_handlers;
+extern const struct pci_error_handlers vfio_pci_err_handlers;
 
 static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
 {
diff --git a/samples/Kconfig b/samples/Kconfig
index 1513feff0255..97c1cc07f657 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -171,9 +171,7 @@ config SAMPLE_VFS
 
 config SAMPLE_VFIO_MDEV_PCI
 	tristate "Sample driver for wrapping PCI device as a mdev"
-	depends on PCI && EVENTFD && VFIO_MDEV_DEVICE
-	select VFIO_VIRQFD
-	select IRQ_BYPASS_MANAGER
+	select VFIO_PCI_COMMON
 	help
 	  Sample driver for wrapping a PCI device as a mdev. Once bound to
 	  this driver, device passthru should through mdev path.


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 03/10] vfio_pci: refine vfio_pci_driver reference in vfio_pci.c
  2019-11-21 11:23 ` [PATCH v3 03/10] vfio_pci: refine vfio_pci_driver reference in vfio_pci.c Liu Yi L
@ 2019-12-15 22:46   ` Alex Williamson
  2019-12-16 11:59     ` Liu, Yi L
  0 siblings, 1 reply; 18+ messages in thread
From: Alex Williamson @ 2019-12-15 22:46 UTC (permalink / raw)
  To: Liu Yi L
  Cc: kwankhede, kevin.tian, baolu.lu, yi.y.sun, joro,
	jean-philippe.brucker, peterx, linux-kernel, kvm

On Thu, 21 Nov 2019 19:23:40 +0800
Liu Yi L <yi.l.liu@intel.com> wrote:

> This patch replaces the vfio_pci_driver reference in vfio_pci.c with
> pci_dev_driver(vdev->pdev) which is more helpful to make the functions
> be generic to module types.
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> ---
>  drivers/vfio/pci/vfio_pci.c | 33 ++++++++++++++++++---------------
>  1 file changed, 18 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index b04e43a..2096e66 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -1460,24 +1460,25 @@ static void vfio_pci_reflck_get(struct vfio_pci_reflck *reflck)
>  
>  static int vfio_pci_reflck_find(struct pci_dev *pdev, void *data)
>  {
> -	struct vfio_pci_reflck **preflck = data;
> +	struct vfio_pci_device *vdev = data;
> +	struct vfio_pci_reflck **preflck = &vdev->reflck;
>  	struct vfio_device *device;
> -	struct vfio_pci_device *vdev;
> +	struct vfio_pci_device *tmp;
>  
>  	device = vfio_device_get_from_dev(&pdev->dev);
>  	if (!device)
>  		return 0;
>  
> -	if (pci_dev_driver(pdev) != &vfio_pci_driver) {
> +	if (pci_dev_driver(pdev) != pci_dev_driver(vdev->pdev)) {
>  		vfio_device_put(device);
>  		return 0;
>  	}
>  
> -	vdev = vfio_device_data(device);
> +	tmp = vfio_device_data(device);
>  
> -	if (vdev->reflck) {
> -		vfio_pci_reflck_get(vdev->reflck);
> -		*preflck = vdev->reflck;
> +	if (tmp->reflck) {
> +		vfio_pci_reflck_get(tmp->reflck);
> +		*preflck = tmp->reflck;
>  		vfio_device_put(device);
>  		return 1;
>  	}
> @@ -1494,7 +1495,7 @@ static int vfio_pci_reflck_attach(struct vfio_pci_device *vdev)
>  
>  	if (pci_is_root_bus(vdev->pdev->bus) ||
>  	    vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_reflck_find,
> -					  &vdev->reflck, slot) <= 0)
> +					  vdev, slot) <= 0)
>  		vdev->reflck = vfio_pci_reflck_alloc();
>  
>  	mutex_unlock(&reflck_lock);
> @@ -1519,6 +1520,7 @@ static void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck)
>  
>  struct vfio_devices {
>  	struct vfio_device **devices;
> +	struct vfio_pci_device *vdev;
>  	int cur_index;
>  	int max_index;
>  };
> @@ -1527,7 +1529,7 @@ static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data)
>  {
>  	struct vfio_devices *devs = data;
>  	struct vfio_device *device;
> -	struct vfio_pci_device *vdev;
> +	struct vfio_pci_device *tmp;
>  
>  	if (devs->cur_index == devs->max_index)
>  		return -ENOSPC;
> @@ -1536,15 +1538,15 @@ static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data)
>  	if (!device)
>  		return -EINVAL;
>  
> -	if (pci_dev_driver(pdev) != &vfio_pci_driver) {
> +	if (pci_dev_driver(pdev) != pci_dev_driver(devs->vdev->pdev)) {
>  		vfio_device_put(device);
>  		return -EBUSY;
>  	}
>  
> -	vdev = vfio_device_data(device);
> +	tmp = vfio_device_data(device);
>  
>  	/* Fault if the device is not unused */
> -	if (vdev->refcnt) {
> +	if (tmp->refcnt) {
>  		vfio_device_put(device);
>  		return -EBUSY;
>  	}
> @@ -1590,6 +1592,7 @@ static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev)
>  	if (!devs.devices)
>  		return;
>  
> +	devs.vdev = vdev;

This could be added to the declaration initializer:

struct vfio_devices devs = { .vdev = vdev, .cur_index = 0 };

It might seem a little less random then.  Thanks,

Alex


>  	if (vfio_pci_for_each_slot_or_bus(vdev->pdev,
>  					  vfio_pci_get_unused_devs,
>  					  &devs, slot))
> @@ -1634,7 +1637,7 @@ static void __exit vfio_pci_cleanup(void)
>  	vfio_pci_uninit_perm_bits();
>  }
>  
> -static void __init vfio_pci_fill_ids(char *ids)
> +static void __init vfio_pci_fill_ids(char *ids, struct pci_driver *driver)
>  {
>  	char *p, *id;
>  	int rc;
> @@ -1662,7 +1665,7 @@ static void __init vfio_pci_fill_ids(char *ids)
>  			continue;
>  		}
>  
> -		rc = pci_add_dynid(&vfio_pci_driver, vendor, device,
> +		rc = pci_add_dynid(driver, vendor, device,
>  				   subvendor, subdevice, class, class_mask, 0);
>  		if (rc)
>  			pr_warn("failed to add dynamic id [%04x:%04x[%04x:%04x]] class %#08x/%08x (%d)\n",
> @@ -1689,7 +1692,7 @@ static int __init vfio_pci_init(void)
>  	if (ret)
>  		goto out_driver;
>  
> -	vfio_pci_fill_ids(ids);
> +	vfio_pci_fill_ids(ids, &vfio_pci_driver);
>  
>  	return 0;
>  


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v3 08/10] vfio/pci: protect cap/ecap_perm bits alloc/free
  2019-12-15 22:46   ` Alex Williamson
@ 2019-12-16 11:57     ` Liu, Yi L
  2019-12-16 17:42       ` Alex Williamson
  0 siblings, 1 reply; 18+ messages in thread
From: Liu, Yi L @ 2019-12-16 11:57 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kwankhede, Tian, Kevin, baolu.lu, Sun, Yi Y, joro,
	jean-philippe.brucker, peterx, linux-kernel, kvm

> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Monday, December 16, 2019 6:47 AM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [PATCH v3 08/10] vfio/pci: protect cap/ecap_perm bits alloc/free
> 
> On Thu, 21 Nov 2019 19:23:45 +0800
> Liu Yi L <yi.l.liu@intel.com> wrote:
> 
> > This patch add a user numer track for the shared cap/ecap_perms bits,
> > and the alloc/free will hold a semaphore to protect the operations.
> > With the changes, first caller of vfio_pci_init_perm_bits() will
> > initialize the bits. While the last caller of vfio_pci_uninit_perm_bits()
> > will free the bits. This is a preparation to have multiple cap/ecap_perms
> > bits users.
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Cc: Lu Baolu <baolu.lu@linux.intel.com>
> > Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/pci/vfio_pci_config.c | 33 +++++++++++++++++++++++++++++++--
> >  1 file changed, 31 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
> > index f0891bd..274c993 100644
> > --- a/drivers/vfio/pci/vfio_pci_config.c
> > +++ b/drivers/vfio/pci/vfio_pci_config.c
> > @@ -36,6 +36,13 @@
> >  	 (offset >= PCI_ROM_ADDRESS && offset < PCI_ROM_ADDRESS + 4))
> >
> >  /*
> > + * vfio_perm_bits_sem: prorects the shared perm_bits alloc/free
> > + * vfio_pci_perm_bits_users: tracks the user of the shared perm_bits
> > + */
> > +static DEFINE_SEMAPHORE(vfio_perm_bits_sem);
> > +static int vfio_pci_perm_bits_users;
> > +
> > +/*
> >   * Lengths of PCI Config Capabilities
> >   *   0: Removed from the user visible capability list
> >   *   FF: Variable length
> > @@ -995,7 +1002,7 @@ static int __init init_pci_ext_cap_pwr_perm(struct
> perm_bits *perm)
> >  /*
> >   * Initialize the shared permission tables
> >   */
> > -void vfio_pci_uninit_perm_bits(void)
> > +static void vfio_pci_uninit_perm_bits_internal(void)
> >  {
> >  	free_perm_bits(&cap_perms[PCI_CAP_ID_BASIC]);
> >
> > @@ -1009,10 +1016,30 @@ void vfio_pci_uninit_perm_bits(void)
> >  	free_perm_bits(&ecap_perms[PCI_EXT_CAP_ID_PWR]);
> >  }
> >
> > +void vfio_pci_uninit_perm_bits(void)
> > +{
> > +	down(&vfio_perm_bits_sem);
> > +
> > +	if (--vfio_pci_perm_bits_users > 0)
> > +		goto out;
> > +
> > +	vfio_pci_uninit_perm_bits_internal();
> > +
> > +out:
> > +	up(&vfio_perm_bits_sem);
> > +}
> > +
> >  int __init vfio_pci_init_perm_bits(void)
> >  {
> >  	int ret;
> >
> > +	down(&vfio_perm_bits_sem);
> > +
> > +	if (++vfio_pci_perm_bits_users > 1) {
> > +		ret = 0;
> > +		goto out;
> > +	}
> > +
> >  	/* Basic config space */
> >  	ret = init_pci_cap_basic_perm(&cap_perms[PCI_CAP_ID_BASIC]);
> >
> > @@ -1030,8 +1057,10 @@ int __init vfio_pci_init_perm_bits(void)
> >  	ecap_perms[PCI_EXT_CAP_ID_VNDR].writefn = vfio_raw_config_write;
> >
> >  	if (ret)
> > -		vfio_pci_uninit_perm_bits();
> > +		vfio_pci_uninit_perm_bits_internal();
> >
> > +out:
> > +	up(&vfio_perm_bits_sem);
> >  	return ret;
> >  }
> >
> 
> Hi Yi,
> 
> Sorry for slowness in providing feedback on this series.  If we
> provided a vfio-pci-common module that vfio-pci and vfio-mdev-pci
> depend on, doesn't this entire problem go away? 

I checked previous email, export the common functions out of
vfio-pci module was proposed in RFC v2. But at that time, I didn't
propose to have a separate module. So I guess it may be just correct
way. Below is the reply at that time.

https://lkml.org/lkml/2019/3/19/756

> I played a little bit
> with this in the crude patch below, it seems to work.  To finish this,
> I think we'd move the function declarations out of the "private" header
> file and into one under include/linux, then we could also move
> vfio_mdev_pci.c to the samples directory like we intended originally.
> I know you had tried to link things from samples and it didn't work,
> but is the below a better attempt at resolving this?  It commits us to
> exporting a bunch of functions, we'll need to decide whether that's a
> good idea.  Thanks,

I played with your patch and added a crude patch to move
vfio-mdev-pci to samples directory. I can see below errors with
either CONFIG_VFIO_PCI_COMMON=y and
CONFIG_SAMPLE_VFIO_MDEV_PCI=y, or
CONFIG_VFIO_PCI_COMMON=m and
CONFIG_SAMPLE_VFIO_MDEV_PCI=m.

But CONFIG_VFIO_PCI_COMMON works well with CONFIG_VFIO_PCI...

Kernel: arch/x86/boot/bzImage is ready  (#88)
ERROR: "vfio_pci_fill_ids" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
ERROR: "vfio_pci_err_handlers" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
ERROR: "vfio_pci_reflck_attach" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
ERROR: "vfio_pci_write" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
ERROR: "vfio_pci_disable" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
ERROR: "vfio_pci_reflck_put" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
ERROR: "vfio_pci_refresh_config" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
ERROR: "vfio_pci_set_power_state" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
ERROR: "vfio_pci_enable" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
ERROR: "vfio_pci_set_vga_decode" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
ERROR: "vfio_pci_probe_power_state" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
ERROR: "vfio_pci_mmap" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
ERROR: "vfio_pci_ioctl" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
ERROR: "vfio_pci_read" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
scripts/Makefile.modpost:93: recipe for target '__modpost' failed

So I'm afraid that it still cannot resolve the problem which we encountered
when trying to place vfio-mdev-pci in samples/.

Regards,
Yi Liu

================= [the crude patch] =================
From fa860ff15ab188481141f7bd2b9cb3a1d500f24d Mon Sep 17 00:00:00 2001
From: Liu Yi L <yi.l.liu@intel.com>
Date: Sun, 15 Dec 2019 19:16:54 +0800
Subject: [PATCH] vfio/pci/sample: move vfio-pci-mdev to samples

Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 drivers/vfio/pci/Makefile             |   2 -
 drivers/vfio/pci/vfio_mdev_pci.c      | 421 ----------------------------------
 drivers/vfio/pci/vfio_pci.c           |   2 +-
 drivers/vfio/pci/vfio_pci_common.c    |   2 +-
 drivers/vfio/pci/vfio_pci_config.c    |   2 +-
 drivers/vfio/pci/vfio_pci_igd.c       |   2 +-
 drivers/vfio/pci/vfio_pci_intrs.c     |   2 +-
 drivers/vfio/pci/vfio_pci_nvlink2.c   |   2 +-
 drivers/vfio/pci/vfio_pci_private.h   | 228 ------------------
 drivers/vfio/pci/vfio_pci_rdwr.c      |   2 +-
 include/linux/vfio_pci_private.h      | 228 ++++++++++++++++++
 samples/Makefile                      |   1 +
 samples/vfio-mdev-pci/Makefile        |   4 +
 samples/vfio-mdev-pci/vfio_mdev_pci.c | 421 ++++++++++++++++++++++++++++++++++
 14 files changed, 661 insertions(+), 658 deletions(-)
 delete mode 100644 drivers/vfio/pci/vfio_mdev_pci.c
 delete mode 100644 drivers/vfio/pci/vfio_pci_private.h
 create mode 100644 include/linux/vfio_pci_private.h
 create mode 100644 samples/vfio-mdev-pci/Makefile
 create mode 100644 samples/vfio-mdev-pci/vfio_mdev_pci.c

diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index 9f599cb..ad60cfd 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -6,8 +6,6 @@ vfio-pci-common-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
 vfio-pci-common-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2.o
 
 vfio-pci-y := vfio_pci.o
-vfio-mdev-pci-y := vfio_mdev_pci.o
 
 obj-$(CONFIG_VFIO_PCI_COMMON) += vfio-pci-common.o
 obj-$(CONFIG_VFIO_PCI) += vfio-pci.o
-obj-$(CONFIG_SAMPLE_VFIO_MDEV_PCI) += vfio-mdev-pci.o
diff --git a/drivers/vfio/pci/vfio_mdev_pci.c b/drivers/vfio/pci/vfio_mdev_pci.c
deleted file mode 100644
index e6b070f..0000000
--- a/drivers/vfio/pci/vfio_mdev_pci.c
+++ /dev/null
@@ -1,421 +0,0 @@
-/*
- * Copyright (c) 2019 Intel Corporation.
- *     Author: Liu Yi L <yi.l.liu@intel.com>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * Derived from original vfio_pci.c:
- * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
- *     Author: Alex Williamson <alex.williamson@redhat.com>
- *
- * Derived from original vfio:
- * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
- * Author: Tom Lyon, pugs@cisco.com
- */
-
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-
-#include <linux/device.h>
-#include <linux/eventfd.h>
-#include <linux/file.h>
-#include <linux/interrupt.h>
-#include <linux/iommu.h>
-#include <linux/module.h>
-#include <linux/mutex.h>
-#include <linux/notifier.h>
-#include <linux/pci.h>
-#include <linux/pm_runtime.h>
-#include <linux/slab.h>
-#include <linux/types.h>
-#include <linux/uaccess.h>
-#include <linux/vfio.h>
-#include <linux/vgaarb.h>
-#include <linux/nospec.h>
-#include <linux/mdev.h>
-
-#include "vfio_pci_private.h"
-
-#define DRIVER_VERSION  "0.1"
-#define DRIVER_AUTHOR   "Liu Yi L <yi.l.liu@intel.com>"
-#define DRIVER_DESC     "VFIO Mdev PCI - Sample driver for PCI device as a mdev"
-
-#define VFIO_MDEV_PCI_NAME  "vfio-mdev-pci"
-
-static char ids[1024] __initdata;
-module_param_string(ids, ids, sizeof(ids), 0);
-MODULE_PARM_DESC(ids, "Initial PCI IDs to add to the vfio-mdev-pci driver, format is \"vendor:device[:subvendor[:subdevice[:class[:class_mask]]]]\" and multiple comma separated entries can be specified");
-
-static bool nointxmask;
-module_param_named(nointxmask, nointxmask, bool, S_IRUGO | S_IWUSR);
-MODULE_PARM_DESC(nointxmask,
-		  "Disable support for PCI 2.3 style INTx masking.  If this resolves problems for specific devices, report lspci -vvvxxx to linux-pci@vger.kernel.org so the device can be fixed automatically via the broken_intx_masking flag.");
-
-#ifdef CONFIG_VFIO_PCI_VGA
-static bool disable_vga;
-module_param(disable_vga, bool, S_IRUGO);
-MODULE_PARM_DESC(disable_vga, "Disable VGA resource access through vfio-mdev-pci");
-#endif
-
-static bool disable_idle_d3;
-module_param(disable_idle_d3, bool, S_IRUGO | S_IWUSR);
-MODULE_PARM_DESC(disable_idle_d3,
-		 "Disable using the PCI D3 low power state for idle, unused devices");
-
-static struct pci_driver vfio_mdev_pci_driver;
-
-struct vfio_mdev_pci_device {
-	struct vfio_pci_device vdev;
-	struct mdev_parent_ops ops;
-	struct attribute_group *groups[2];
-	struct attribute_group attr;
-	atomic_t avail;
-};
-
-static ssize_t
-available_instances_show(struct kobject *kobj, struct device *dev, char *buf)
-{
-	struct vfio_mdev_pci_device *vmdev;
-
-	vmdev = pci_get_drvdata(to_pci_dev(dev));
-
-	return sprintf(buf, "%d\n", atomic_read(&vmdev->avail));
-}
-
-MDEV_TYPE_ATTR_RO(available_instances);
-
-static ssize_t device_api_show(struct kobject *kobj, struct device *dev,
-		char *buf)
-{
-	return sprintf(buf, "%s\n", VFIO_DEVICE_API_PCI_STRING);
-}
-
-MDEV_TYPE_ATTR_RO(device_api);
-
-static struct attribute *vfio_mdev_pci_types_attrs[] = {
-	&mdev_type_attr_device_api.attr,
-	&mdev_type_attr_available_instances.attr,
-	NULL,
-};
-
-struct vfio_mdev_pci {
-	struct vfio_pci_device *vdev;
-	struct mdev_device *mdev;
-};
-
-static int vfio_mdev_pci_create(struct kobject *kobj, struct mdev_device *mdev)
-{
-	struct device *pdev;
-	struct vfio_mdev_pci_device *vmdev;
-	struct vfio_mdev_pci *pmdev;
-	int ret;
-
-	pdev = mdev_parent_dev(mdev);
-	vmdev = dev_get_drvdata(pdev);
-
-	if (atomic_dec_if_positive(&vmdev->avail) < 0)
-		return -ENOSPC;
-
-	pmdev = kzalloc(sizeof(struct vfio_mdev_pci), GFP_KERNEL);
-	if (!pmdev) {
-		atomic_inc(&vmdev->avail);
-		return -ENOMEM;
-	}
-
-	pmdev->mdev = mdev;
-	pmdev->vdev = &vmdev->vdev;
-	mdev_set_drvdata(mdev, pmdev);
-	ret = mdev_set_iommu_device(mdev_dev(mdev), pdev);
-	if (ret) {
-		pr_info("%s, failed to config iommu isolation for mdev: %s on pf: %s\n",
-			__func__, dev_name(mdev_dev(mdev)), dev_name(pdev));
-		kfree(pmdev);
-		atomic_inc(&vmdev->avail);
-		return ret;
-	}
-
-	pr_info("%s, creation succeeded for mdev: %s\n", __func__,
-		     dev_name(mdev_dev(mdev)));
-	return 0;
-}
-
-static int vfio_mdev_pci_remove(struct mdev_device *mdev)
-{
-	struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev);
-	struct vfio_mdev_pci_device *vmdev;
-
-	vmdev = container_of(pmdev->vdev, struct vfio_mdev_pci_device, vdev);
-
-	kfree(pmdev);
-	atomic_inc(&vmdev->avail);
-	pr_info("%s, succeeded for mdev: %s\n", __func__,
-		     dev_name(mdev_dev(mdev)));
-
-	return 0;
-}
-
-static int vfio_mdev_pci_open(struct mdev_device *mdev)
-{
-	struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev);
-	struct vfio_pci_device *vdev = pmdev->vdev;
-	int ret = 0;
-
-	if (!try_module_get(THIS_MODULE))
-		return -ENODEV;
-
-	vfio_pci_refresh_config(vdev, nointxmask, disable_idle_d3);
-
-	mutex_lock(&vdev->reflck->lock);
-
-	if (!vdev->refcnt) {
-		ret = vfio_pci_enable(vdev);
-		if (ret)
-			goto error;
-
-		vfio_spapr_pci_eeh_open(vdev->pdev);
-	}
-	vdev->refcnt++;
-error:
-	mutex_unlock(&vdev->reflck->lock);
-	if (!ret)
-		pr_info("Succeeded to open mdev: %s on pf: %s\n",
-		dev_name(mdev_dev(mdev)), dev_name(&pmdev->vdev->pdev->dev));
-	else {
-		pr_info("Failed to open mdev: %s on pf: %s\n",
-		dev_name(mdev_dev(mdev)), dev_name(&pmdev->vdev->pdev->dev));
-		module_put(THIS_MODULE);
-	}
-	return ret;
-}
-
-static void vfio_mdev_pci_release(struct mdev_device *mdev)
-{
-	struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev);
-	struct vfio_pci_device *vdev = pmdev->vdev;
-
-	pr_info("Release mdev: %s on pf: %s\n",
-		dev_name(mdev_dev(mdev)), dev_name(&pmdev->vdev->pdev->dev));
-
-	mutex_lock(&vdev->reflck->lock);
-
-	if (!(--vdev->refcnt)) {
-		vfio_spapr_pci_eeh_release(vdev->pdev);
-		vfio_pci_disable(vdev);
-	}
-
-	mutex_unlock(&vdev->reflck->lock);
-
-	module_put(THIS_MODULE);
-}
-
-static long vfio_mdev_pci_ioctl(struct mdev_device *mdev, unsigned int cmd,
-			     unsigned long arg)
-{
-	struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev);
-
-	return vfio_pci_ioctl(pmdev->vdev, cmd, arg);
-}
-
-static int vfio_mdev_pci_mmap(struct mdev_device *mdev,
-				struct vm_area_struct *vma)
-{
-	struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev);
-
-	return vfio_pci_mmap(pmdev->vdev, vma);
-}
-
-static ssize_t vfio_mdev_pci_read(struct mdev_device *mdev, char __user *buf,
-			size_t count, loff_t *ppos)
-{
-	struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev);
-
-	return vfio_pci_read(pmdev->vdev, buf, count, ppos);
-}
-
-static ssize_t vfio_mdev_pci_write(struct mdev_device *mdev,
-				const char __user *buf,
-				size_t count, loff_t *ppos)
-{
-	struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev);
-
-	return vfio_pci_write(pmdev->vdev, (char __user *)buf, count, ppos);
-}
-
-static int vfio_mdev_pci_driver_probe(struct pci_dev *pdev,
-				       const struct pci_device_id *id)
-{
-	struct vfio_mdev_pci_device *vmdev;
-	struct vfio_pci_device *vdev;
-	const struct mdev_parent_ops *ops;
-	int ret;
-
-	if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
-		return -EINVAL;
-
-	/*
-	 * Prevent binding to PFs with VFs enabled, this too easily allows
-	 * userspace instance with VFs and PFs from the same device, which
-	 * cannot work.  Disabling SR-IOV here would initiate removing the
-	 * VFs, which would unbind the driver, which is prone to blocking
-	 * if that VF is also in use by vfio-pci or vfio-mdev-pci. Just
-	 * reject these PFs and let the user sort it out.
-	 */
-	if (pci_num_vf(pdev)) {
-		pci_warn(pdev, "Cannot bind to PF with SR-IOV enabled\n");
-		return -EBUSY;
-	}
-
-	vmdev = kzalloc(sizeof(*vmdev), GFP_KERNEL);
-	if (!vmdev)
-		return -ENOMEM;
-
-	vmdev->attr.name = kasprintf(GFP_KERNEL,
-				     "%04x:%04x:%04x:%04x:%06x:%02x",
-				     pdev->vendor, pdev->device,
-				     pdev->subsystem_vendor,
-				     pdev->subsystem_device, pdev->class,
-				     pdev->revision);
-	if (!vmdev->attr.name) {
-		kfree(vmdev);
-		return -ENOMEM;
-	}
-
-	atomic_set(&vmdev->avail, 1);
-
-	vmdev->attr.attrs = vfio_mdev_pci_types_attrs;
-	vmdev->groups[0] = &vmdev->attr;
-
-	vmdev->ops.supported_type_groups = vmdev->groups;
-	vmdev->ops.create = vfio_mdev_pci_create;
-	vmdev->ops.remove = vfio_mdev_pci_remove;
-	vmdev->ops.open	= vfio_mdev_pci_open;
-	vmdev->ops.release = vfio_mdev_pci_release;
-	vmdev->ops.read = vfio_mdev_pci_read;
-	vmdev->ops.write = vfio_mdev_pci_write;
-	vmdev->ops.mmap = vfio_mdev_pci_mmap;
-	vmdev->ops.ioctl = vfio_mdev_pci_ioctl;
-	ops = &vmdev->ops;
-
-	vdev = &vmdev->vdev;
-	vdev->pdev = pdev;
-	vdev->irq_type = VFIO_PCI_NUM_IRQS;
-	mutex_init(&vdev->igate);
-	spin_lock_init(&vdev->irqlock);
-	mutex_init(&vdev->ioeventfds_lock);
-	INIT_LIST_HEAD(&vdev->ioeventfds_list);
-	vdev->nointxmask = nointxmask;
-#ifdef CONFIG_VFIO_PCI_VGA
-	vdev->disable_vga = disable_vga;
-#endif
-	vdev->disable_idle_d3 = disable_idle_d3;
-
-	pci_set_drvdata(pdev, vmdev);
-
-	ret = vfio_pci_reflck_attach(vdev);
-	if (ret) {
-		pci_set_drvdata(pdev, NULL);
-		kfree(vdev);
-		return ret;
-	}
-
-	if (vfio_pci_is_vga(pdev)) {
-		vga_client_register(pdev, vdev, NULL, vfio_pci_set_vga_decode);
-		vga_set_legacy_decoding(pdev,
-					vfio_pci_set_vga_decode(vdev, false));
-	}
-
-	vfio_pci_probe_power_state(vdev);
-
-	if (!vdev->disable_idle_d3) {
-		/*
-		 * pci-core sets the device power state to an unknown value at
-		 * bootup and after being removed from a driver.  The only
-		 * transition it allows from this unknown state is to D0, which
-		 * typically happens when a driver calls pci_enable_device().
-		 * We're not ready to enable the device yet, but we do want to
-		 * be able to get to D3.  Therefore first do a D0 transition
-		 * before going to D3.
-		 */
-		vfio_pci_set_power_state(vdev, PCI_D0);
-		vfio_pci_set_power_state(vdev, PCI_D3hot);
-	}
-
-	ret = mdev_register_device(&pdev->dev, ops);
-	if (ret)
-		pr_err("Cannot register mdev for device %s\n",
-			dev_name(&pdev->dev));
-	else
-		pr_info("Wrap device %s as a mdev\n", dev_name(&pdev->dev));
-
-	return ret;
-}
-
-static void vfio_mdev_pci_driver_remove(struct pci_dev *pdev)
-{
-	struct vfio_mdev_pci_device *vmdev;
-	struct vfio_pci_device *vdev;
-
-	mdev_unregister_device(&pdev->dev);
-
-	vmdev = pci_get_drvdata(pdev);
-	if (!vmdev)
-		return;
-
-	vdev = &vmdev->vdev;
-
-	vfio_pci_reflck_put(vdev->reflck);
-
-	kfree(vdev->region);
-	mutex_destroy(&vdev->ioeventfds_lock);
-
-	if (!disable_idle_d3)
-		vfio_pci_set_power_state(vdev, PCI_D0);
-
-	kfree(vdev->pm_save);
-
-	if (vfio_pci_is_vga(pdev)) {
-		vga_client_register(pdev, NULL, NULL, NULL);
-		vga_set_legacy_decoding(pdev,
-				VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM |
-				VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM);
-	}
-
-	kfree(vmdev->attr.name);
-	kfree(vmdev);
-}
-
-static struct pci_driver vfio_mdev_pci_driver = {
-	.name		= VFIO_MDEV_PCI_NAME,
-	.id_table	= NULL, /* only dynamic ids */
-	.probe		= vfio_mdev_pci_driver_probe,
-	.remove		= vfio_mdev_pci_driver_remove,
-	.err_handler	= &vfio_pci_err_handlers,
-};
-
-static void __exit vfio_mdev_pci_cleanup(void)
-{
-	pci_unregister_driver(&vfio_mdev_pci_driver);
-}
-
-static int __init vfio_mdev_pci_init(void)
-{
-	int ret;
-
-	/* Register and scan for devices */
-	ret = pci_register_driver(&vfio_mdev_pci_driver);
-	if (ret)
-		return ret;
-
-	vfio_pci_fill_ids(ids, &vfio_mdev_pci_driver);
-
-	return 0;
-}
-
-module_init(vfio_mdev_pci_init);
-module_exit(vfio_mdev_pci_cleanup);
-
-MODULE_VERSION(DRIVER_VERSION);
-MODULE_LICENSE("GPL v2");
-MODULE_AUTHOR(DRIVER_AUTHOR);
-MODULE_DESCRIPTION(DRIVER_DESC);
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 7047667..5550858 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -28,7 +28,7 @@
 #include <linux/vgaarb.h>
 #include <linux/nospec.h>
 
-#include "vfio_pci_private.h"
+#include <linux/vfio_pci_private.h>
 
 #define DRIVER_VERSION  "0.2"
 #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
diff --git a/drivers/vfio/pci/vfio_pci_common.c b/drivers/vfio/pci/vfio_pci_common.c
index dd6f8fd..0523dab 100644
--- a/drivers/vfio/pci/vfio_pci_common.c
+++ b/drivers/vfio/pci/vfio_pci_common.c
@@ -28,7 +28,7 @@
 #include <linux/vgaarb.h>
 #include <linux/nospec.h>
 
-#include "vfio_pci_private.h"
+#include <linux/vfio_pci_private.h>
 
 #define DRIVER_VERSION  "0.2"
 #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index f0891bd..9040ba2 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -26,7 +26,7 @@
 #include <linux/vfio.h>
 #include <linux/slab.h>
 
-#include "vfio_pci_private.h"
+#include <linux/vfio_pci_private.h>
 
 /* Fake capability ID for standard config space */
 #define PCI_CAP_ID_BASIC	0
diff --git a/drivers/vfio/pci/vfio_pci_igd.c b/drivers/vfio/pci/vfio_pci_igd.c
index 53d97f4..4566e93 100644
--- a/drivers/vfio/pci/vfio_pci_igd.c
+++ b/drivers/vfio/pci/vfio_pci_igd.c
@@ -15,7 +15,7 @@
 #include <linux/uaccess.h>
 #include <linux/vfio.h>
 
-#include "vfio_pci_private.h"
+#include <linux/vfio_pci_private.h>
 
 #define OPREGION_SIGNATURE	"IntelGraphicsMem"
 #define OPREGION_SIZE		(8 * 1024)
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 3fa3f72..6da3eba 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -20,7 +20,7 @@
 #include <linux/wait.h>
 #include <linux/slab.h>
 
-#include "vfio_pci_private.h"
+#include <linux/vfio_pci_private.h>
 
 /*
  * INTx
diff --git a/drivers/vfio/pci/vfio_pci_nvlink2.c b/drivers/vfio/pci/vfio_pci_nvlink2.c
index f2983f0..7cac168 100644
--- a/drivers/vfio/pci/vfio_pci_nvlink2.c
+++ b/drivers/vfio/pci/vfio_pci_nvlink2.c
@@ -19,7 +19,7 @@
 #include <linux/sched/mm.h>
 #include <linux/mmu_context.h>
 #include <asm/kvm_ppc.h>
-#include "vfio_pci_private.h"
+#include <linux/vfio_pci_private.h>
 
 #define CREATE_TRACE_POINTS
 #include "trace.h"
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
deleted file mode 100644
index 562b7c1..0000000
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ /dev/null
@@ -1,228 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
- *     Author: Alex Williamson <alex.williamson@redhat.com>
- *
- * Derived from original vfio:
- * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
- * Author: Tom Lyon, pugs@cisco.com
- */
-
-#include <linux/mutex.h>
-#include <linux/pci.h>
-#include <linux/irqbypass.h>
-#include <linux/types.h>
-
-#ifndef VFIO_PCI_PRIVATE_H
-#define VFIO_PCI_PRIVATE_H
-
-#define VFIO_PCI_OFFSET_SHIFT   40
-
-#define VFIO_PCI_OFFSET_TO_INDEX(off)	(off >> VFIO_PCI_OFFSET_SHIFT)
-#define VFIO_PCI_INDEX_TO_OFFSET(index)	((u64)(index) << VFIO_PCI_OFFSET_SHIFT)
-#define VFIO_PCI_OFFSET_MASK	(((u64)(1) << VFIO_PCI_OFFSET_SHIFT) - 1)
-
-/* Special capability IDs predefined access */
-#define PCI_CAP_ID_INVALID		0xFF	/* default raw access */
-#define PCI_CAP_ID_INVALID_VIRT		0xFE	/* default virt access */
-
-/* Cap maximum number of ioeventfds per device (arbitrary) */
-#define VFIO_PCI_IOEVENTFD_MAX		1000
-
-struct vfio_pci_ioeventfd {
-	struct list_head	next;
-	struct virqfd		*virqfd;
-	void __iomem		*addr;
-	uint64_t		data;
-	loff_t			pos;
-	int			bar;
-	int			count;
-};
-
-struct vfio_pci_irq_ctx {
-	struct eventfd_ctx	*trigger;
-	struct virqfd		*unmask;
-	struct virqfd		*mask;
-	char			*name;
-	bool			masked;
-	struct irq_bypass_producer	producer;
-};
-
-struct vfio_pci_device;
-struct vfio_pci_region;
-
-struct vfio_pci_regops {
-	size_t	(*rw)(struct vfio_pci_device *vdev, char __user *buf,
-		      size_t count, loff_t *ppos, bool iswrite);
-	void	(*release)(struct vfio_pci_device *vdev,
-			   struct vfio_pci_region *region);
-	int	(*mmap)(struct vfio_pci_device *vdev,
-			struct vfio_pci_region *region,
-			struct vm_area_struct *vma);
-	int	(*add_capability)(struct vfio_pci_device *vdev,
-				  struct vfio_pci_region *region,
-				  struct vfio_info_cap *caps);
-};
-
-struct vfio_pci_region {
-	u32				type;
-	u32				subtype;
-	const struct vfio_pci_regops	*ops;
-	void				*data;
-	size_t				size;
-	u32				flags;
-};
-
-struct vfio_pci_dummy_resource {
-	struct resource		resource;
-	int			index;
-	struct list_head	res_next;
-};
-
-struct vfio_pci_reflck {
-	struct kref		kref;
-	struct mutex		lock;
-};
-
-struct vfio_pci_device {
-	struct pci_dev		*pdev;
-	void __iomem		*barmap[PCI_STD_RESOURCE_END + 1];
-	bool			bar_mmap_supported[PCI_STD_RESOURCE_END + 1];
-	u8			*pci_config_map;
-	u8			*vconfig;
-	struct perm_bits	*msi_perm;
-	spinlock_t		irqlock;
-	struct mutex		igate;
-	struct vfio_pci_irq_ctx	*ctx;
-	int			num_ctx;
-	int			irq_type;
-	int			num_regions;
-	struct vfio_pci_region	*region;
-	u8			msi_qmax;
-	u8			msix_bar;
-	u16			msix_size;
-	u32			msix_offset;
-	u32			rbar[7];
-	bool			pci_2_3;
-	bool			virq_disabled;
-	bool			reset_works;
-	bool			extended_caps;
-	bool			bardirty;
-	bool			has_vga;
-	bool			needs_reset;
-	bool			nointx;
-	bool			needs_pm_restore;
-	struct pci_saved_state	*pci_saved_state;
-	struct pci_saved_state	*pm_save;
-	struct vfio_pci_reflck	*reflck;
-	int			refcnt;
-	int			ioeventfds_nr;
-	struct eventfd_ctx	*err_trigger;
-	struct eventfd_ctx	*req_trigger;
-	struct list_head	dummy_resources_list;
-	struct mutex		ioeventfds_lock;
-	struct list_head	ioeventfds_list;
-	bool			nointxmask;
-#ifdef CONFIG_VFIO_PCI_VGA
-	bool			disable_vga;
-#endif
-	bool			disable_idle_d3;
-};
-
-#define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX)
-#define is_msi(vdev) (vdev->irq_type == VFIO_PCI_MSI_IRQ_INDEX)
-#define is_msix(vdev) (vdev->irq_type == VFIO_PCI_MSIX_IRQ_INDEX)
-#define is_irq_none(vdev) (!(is_intx(vdev) || is_msi(vdev) || is_msix(vdev)))
-#define irq_is(vdev, type) (vdev->irq_type == type)
-
-extern const struct pci_error_handlers vfio_pci_err_handlers;
-
-static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
-{
-	return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA;
-}
-
-static inline bool vfio_vga_disabled(struct vfio_pci_device *vdev)
-{
-#ifdef CONFIG_VFIO_PCI_VGA
-	return vdev->disable_vga;
-#else
-	return true;
-#endif
-}
-
-extern void vfio_pci_refresh_config(struct vfio_pci_device *vdev,
-				bool nointxmask, bool disable_idle_d3);
-
-extern void vfio_pci_intx_mask(struct vfio_pci_device *vdev);
-extern void vfio_pci_intx_unmask(struct vfio_pci_device *vdev);
-
-extern int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev,
-				   uint32_t flags, unsigned index,
-				   unsigned start, unsigned count, void *data);
-
-extern ssize_t vfio_pci_config_rw(struct vfio_pci_device *vdev,
-				  char __user *buf, size_t count,
-				  loff_t *ppos, bool iswrite);
-
-extern ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char __user *buf,
-			       size_t count, loff_t *ppos, bool iswrite);
-
-extern ssize_t vfio_pci_vga_rw(struct vfio_pci_device *vdev, char __user *buf,
-			       size_t count, loff_t *ppos, bool iswrite);
-
-extern long vfio_pci_ioeventfd(struct vfio_pci_device *vdev, loff_t offset,
-			       uint64_t data, int count, int fd);
-
-extern int vfio_pci_init_perm_bits(void);
-extern void vfio_pci_uninit_perm_bits(void);
-
-extern int vfio_config_init(struct vfio_pci_device *vdev);
-extern void vfio_config_free(struct vfio_pci_device *vdev);
-
-extern int vfio_pci_register_dev_region(struct vfio_pci_device *vdev,
-					unsigned int type, unsigned int subtype,
-					const struct vfio_pci_regops *ops,
-					size_t size, u32 flags, void *data);
-
-extern int vfio_pci_set_power_state(struct vfio_pci_device *vdev,
-				    pci_power_t state);
-extern unsigned int vfio_pci_set_vga_decode(void *opaque, bool single_vga);
-extern int vfio_pci_enable(struct vfio_pci_device *vdev);
-extern void vfio_pci_disable(struct vfio_pci_device *vdev);
-extern long vfio_pci_ioctl(void *device_data,
-			unsigned int cmd, unsigned long arg);
-extern ssize_t vfio_pci_read(void *device_data, char __user *buf,
-			size_t count, loff_t *ppos);
-extern ssize_t vfio_pci_write(void *device_data, const char __user *buf,
-			size_t count, loff_t *ppos);
-extern int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma);
-extern void vfio_pci_request(void *device_data, unsigned int count);
-extern void vfio_pci_fill_ids(char *ids, struct pci_driver *driver);
-extern int vfio_pci_reflck_attach(struct vfio_pci_device *vdev);
-extern void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck);
-extern void vfio_pci_probe_power_state(struct vfio_pci_device *vdev);
-
-#ifdef CONFIG_VFIO_PCI_IGD
-extern int vfio_pci_igd_init(struct vfio_pci_device *vdev);
-#else
-static inline int vfio_pci_igd_init(struct vfio_pci_device *vdev)
-{
-	return -ENODEV;
-}
-#endif
-#ifdef CONFIG_VFIO_PCI_NVLINK2
-extern int vfio_pci_nvdia_v100_nvlink2_init(struct vfio_pci_device *vdev);
-extern int vfio_pci_ibm_npu2_init(struct vfio_pci_device *vdev);
-#else
-static inline int vfio_pci_nvdia_v100_nvlink2_init(struct vfio_pci_device *vdev)
-{
-	return -ENODEV;
-}
-
-static inline int vfio_pci_ibm_npu2_init(struct vfio_pci_device *vdev)
-{
-	return -ENODEV;
-}
-#endif
-#endif /* VFIO_PCI_PRIVATE_H */
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 0120d83..72bc689 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -17,7 +17,7 @@
 #include <linux/vfio.h>
 #include <linux/vgaarb.h>
 
-#include "vfio_pci_private.h"
+#include <linux/vfio_pci_private.h>
 
 #ifdef __LITTLE_ENDIAN
 #define vfio_ioread64	ioread64
diff --git a/include/linux/vfio_pci_private.h b/include/linux/vfio_pci_private.h
new file mode 100644
index 0000000..562b7c1
--- /dev/null
+++ b/include/linux/vfio_pci_private.h
@@ -0,0 +1,228 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
+ *     Author: Alex Williamson <alex.williamson@redhat.com>
+ *
+ * Derived from original vfio:
+ * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
+ * Author: Tom Lyon, pugs@cisco.com
+ */
+
+#include <linux/mutex.h>
+#include <linux/pci.h>
+#include <linux/irqbypass.h>
+#include <linux/types.h>
+
+#ifndef VFIO_PCI_PRIVATE_H
+#define VFIO_PCI_PRIVATE_H
+
+#define VFIO_PCI_OFFSET_SHIFT   40
+
+#define VFIO_PCI_OFFSET_TO_INDEX(off)	(off >> VFIO_PCI_OFFSET_SHIFT)
+#define VFIO_PCI_INDEX_TO_OFFSET(index)	((u64)(index) << VFIO_PCI_OFFSET_SHIFT)
+#define VFIO_PCI_OFFSET_MASK	(((u64)(1) << VFIO_PCI_OFFSET_SHIFT) - 1)
+
+/* Special capability IDs predefined access */
+#define PCI_CAP_ID_INVALID		0xFF	/* default raw access */
+#define PCI_CAP_ID_INVALID_VIRT		0xFE	/* default virt access */
+
+/* Cap maximum number of ioeventfds per device (arbitrary) */
+#define VFIO_PCI_IOEVENTFD_MAX		1000
+
+struct vfio_pci_ioeventfd {
+	struct list_head	next;
+	struct virqfd		*virqfd;
+	void __iomem		*addr;
+	uint64_t		data;
+	loff_t			pos;
+	int			bar;
+	int			count;
+};
+
+struct vfio_pci_irq_ctx {
+	struct eventfd_ctx	*trigger;
+	struct virqfd		*unmask;
+	struct virqfd		*mask;
+	char			*name;
+	bool			masked;
+	struct irq_bypass_producer	producer;
+};
+
+struct vfio_pci_device;
+struct vfio_pci_region;
+
+struct vfio_pci_regops {
+	size_t	(*rw)(struct vfio_pci_device *vdev, char __user *buf,
+		      size_t count, loff_t *ppos, bool iswrite);
+	void	(*release)(struct vfio_pci_device *vdev,
+			   struct vfio_pci_region *region);
+	int	(*mmap)(struct vfio_pci_device *vdev,
+			struct vfio_pci_region *region,
+			struct vm_area_struct *vma);
+	int	(*add_capability)(struct vfio_pci_device *vdev,
+				  struct vfio_pci_region *region,
+				  struct vfio_info_cap *caps);
+};
+
+struct vfio_pci_region {
+	u32				type;
+	u32				subtype;
+	const struct vfio_pci_regops	*ops;
+	void				*data;
+	size_t				size;
+	u32				flags;
+};
+
+struct vfio_pci_dummy_resource {
+	struct resource		resource;
+	int			index;
+	struct list_head	res_next;
+};
+
+struct vfio_pci_reflck {
+	struct kref		kref;
+	struct mutex		lock;
+};
+
+struct vfio_pci_device {
+	struct pci_dev		*pdev;
+	void __iomem		*barmap[PCI_STD_RESOURCE_END + 1];
+	bool			bar_mmap_supported[PCI_STD_RESOURCE_END + 1];
+	u8			*pci_config_map;
+	u8			*vconfig;
+	struct perm_bits	*msi_perm;
+	spinlock_t		irqlock;
+	struct mutex		igate;
+	struct vfio_pci_irq_ctx	*ctx;
+	int			num_ctx;
+	int			irq_type;
+	int			num_regions;
+	struct vfio_pci_region	*region;
+	u8			msi_qmax;
+	u8			msix_bar;
+	u16			msix_size;
+	u32			msix_offset;
+	u32			rbar[7];
+	bool			pci_2_3;
+	bool			virq_disabled;
+	bool			reset_works;
+	bool			extended_caps;
+	bool			bardirty;
+	bool			has_vga;
+	bool			needs_reset;
+	bool			nointx;
+	bool			needs_pm_restore;
+	struct pci_saved_state	*pci_saved_state;
+	struct pci_saved_state	*pm_save;
+	struct vfio_pci_reflck	*reflck;
+	int			refcnt;
+	int			ioeventfds_nr;
+	struct eventfd_ctx	*err_trigger;
+	struct eventfd_ctx	*req_trigger;
+	struct list_head	dummy_resources_list;
+	struct mutex		ioeventfds_lock;
+	struct list_head	ioeventfds_list;
+	bool			nointxmask;
+#ifdef CONFIG_VFIO_PCI_VGA
+	bool			disable_vga;
+#endif
+	bool			disable_idle_d3;
+};
+
+#define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX)
+#define is_msi(vdev) (vdev->irq_type == VFIO_PCI_MSI_IRQ_INDEX)
+#define is_msix(vdev) (vdev->irq_type == VFIO_PCI_MSIX_IRQ_INDEX)
+#define is_irq_none(vdev) (!(is_intx(vdev) || is_msi(vdev) || is_msix(vdev)))
+#define irq_is(vdev, type) (vdev->irq_type == type)
+
+extern const struct pci_error_handlers vfio_pci_err_handlers;
+
+static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
+{
+	return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA;
+}
+
+static inline bool vfio_vga_disabled(struct vfio_pci_device *vdev)
+{
+#ifdef CONFIG_VFIO_PCI_VGA
+	return vdev->disable_vga;
+#else
+	return true;
+#endif
+}
+
+extern void vfio_pci_refresh_config(struct vfio_pci_device *vdev,
+				bool nointxmask, bool disable_idle_d3);
+
+extern void vfio_pci_intx_mask(struct vfio_pci_device *vdev);
+extern void vfio_pci_intx_unmask(struct vfio_pci_device *vdev);
+
+extern int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev,
+				   uint32_t flags, unsigned index,
+				   unsigned start, unsigned count, void *data);
+
+extern ssize_t vfio_pci_config_rw(struct vfio_pci_device *vdev,
+				  char __user *buf, size_t count,
+				  loff_t *ppos, bool iswrite);
+
+extern ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char __user *buf,
+			       size_t count, loff_t *ppos, bool iswrite);
+
+extern ssize_t vfio_pci_vga_rw(struct vfio_pci_device *vdev, char __user *buf,
+			       size_t count, loff_t *ppos, bool iswrite);
+
+extern long vfio_pci_ioeventfd(struct vfio_pci_device *vdev, loff_t offset,
+			       uint64_t data, int count, int fd);
+
+extern int vfio_pci_init_perm_bits(void);
+extern void vfio_pci_uninit_perm_bits(void);
+
+extern int vfio_config_init(struct vfio_pci_device *vdev);
+extern void vfio_config_free(struct vfio_pci_device *vdev);
+
+extern int vfio_pci_register_dev_region(struct vfio_pci_device *vdev,
+					unsigned int type, unsigned int subtype,
+					const struct vfio_pci_regops *ops,
+					size_t size, u32 flags, void *data);
+
+extern int vfio_pci_set_power_state(struct vfio_pci_device *vdev,
+				    pci_power_t state);
+extern unsigned int vfio_pci_set_vga_decode(void *opaque, bool single_vga);
+extern int vfio_pci_enable(struct vfio_pci_device *vdev);
+extern void vfio_pci_disable(struct vfio_pci_device *vdev);
+extern long vfio_pci_ioctl(void *device_data,
+			unsigned int cmd, unsigned long arg);
+extern ssize_t vfio_pci_read(void *device_data, char __user *buf,
+			size_t count, loff_t *ppos);
+extern ssize_t vfio_pci_write(void *device_data, const char __user *buf,
+			size_t count, loff_t *ppos);
+extern int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma);
+extern void vfio_pci_request(void *device_data, unsigned int count);
+extern void vfio_pci_fill_ids(char *ids, struct pci_driver *driver);
+extern int vfio_pci_reflck_attach(struct vfio_pci_device *vdev);
+extern void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck);
+extern void vfio_pci_probe_power_state(struct vfio_pci_device *vdev);
+
+#ifdef CONFIG_VFIO_PCI_IGD
+extern int vfio_pci_igd_init(struct vfio_pci_device *vdev);
+#else
+static inline int vfio_pci_igd_init(struct vfio_pci_device *vdev)
+{
+	return -ENODEV;
+}
+#endif
+#ifdef CONFIG_VFIO_PCI_NVLINK2
+extern int vfio_pci_nvdia_v100_nvlink2_init(struct vfio_pci_device *vdev);
+extern int vfio_pci_ibm_npu2_init(struct vfio_pci_device *vdev);
+#else
+static inline int vfio_pci_nvdia_v100_nvlink2_init(struct vfio_pci_device *vdev)
+{
+	return -ENODEV;
+}
+
+static inline int vfio_pci_ibm_npu2_init(struct vfio_pci_device *vdev)
+{
+	return -ENODEV;
+}
+#endif
+#endif /* VFIO_PCI_PRIVATE_H */
diff --git a/samples/Makefile b/samples/Makefile
index 7d6e4ca..a5f8867 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -19,4 +19,5 @@ obj-$(CONFIG_SAMPLE_TRACE_EVENTS)	+= trace_events/
 obj-$(CONFIG_SAMPLE_TRACE_PRINTK)	+= trace_printk/
 obj-$(CONFIG_VIDEO_PCI_SKELETON)	+= v4l/
 obj-y					+= vfio-mdev/
+obj-y					+= vfio-mdev-pci/
 subdir-$(CONFIG_SAMPLE_VFS)		+= vfs
diff --git a/samples/vfio-mdev-pci/Makefile b/samples/vfio-mdev-pci/Makefile
new file mode 100644
index 0000000..41b2139
--- /dev/null
+++ b/samples/vfio-mdev-pci/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0-only
+vfio-mdev-pci-y := vfio_mdev_pci.o
+
+obj-$(CONFIG_SAMPLE_VFIO_MDEV_PCI) += vfio-mdev-pci.o
diff --git a/samples/vfio-mdev-pci/vfio_mdev_pci.c b/samples/vfio-mdev-pci/vfio_mdev_pci.c
new file mode 100644
index 0000000..e433b0a
--- /dev/null
+++ b/samples/vfio-mdev-pci/vfio_mdev_pci.c
@@ -0,0 +1,421 @@
+/*
+ * Copyright (c) 2019 Intel Corporation.
+ *     Author: Liu Yi L <yi.l.liu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Derived from original vfio_pci.c:
+ * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
+ *     Author: Alex Williamson <alex.williamson@redhat.com>
+ *
+ * Derived from original vfio:
+ * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
+ * Author: Tom Lyon, pugs@cisco.com
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/device.h>
+#include <linux/eventfd.h>
+#include <linux/file.h>
+#include <linux/interrupt.h>
+#include <linux/iommu.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/pci.h>
+#include <linux/pm_runtime.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <linux/uaccess.h>
+#include <linux/vfio.h>
+#include <linux/vgaarb.h>
+#include <linux/nospec.h>
+#include <linux/mdev.h>
+
+#include <linux/vfio_pci_private.h>
+
+#define DRIVER_VERSION  "0.1"
+#define DRIVER_AUTHOR   "Liu Yi L <yi.l.liu@intel.com>"
+#define DRIVER_DESC     "VFIO Mdev PCI - Sample driver for PCI device as a mdev"
+
+#define VFIO_MDEV_PCI_NAME  "vfio-mdev-pci"
+
+static char ids[1024] __initdata;
+module_param_string(ids, ids, sizeof(ids), 0);
+MODULE_PARM_DESC(ids, "Initial PCI IDs to add to the vfio-mdev-pci driver, format is \"vendor:device[:subvendor[:subdevice[:class[:class_mask]]]]\" and multiple comma separated entries can be specified");
+
+static bool nointxmask;
+module_param_named(nointxmask, nointxmask, bool, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(nointxmask,
+		  "Disable support for PCI 2.3 style INTx masking.  If this resolves problems for specific devices, report lspci -vvvxxx to linux-pci@vger.kernel.org so the device can be fixed automatically via the broken_intx_masking flag.");
+
+#ifdef CONFIG_VFIO_PCI_VGA
+static bool disable_vga;
+module_param(disable_vga, bool, S_IRUGO);
+MODULE_PARM_DESC(disable_vga, "Disable VGA resource access through vfio-mdev-pci");
+#endif
+
+static bool disable_idle_d3;
+module_param(disable_idle_d3, bool, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(disable_idle_d3,
+		 "Disable using the PCI D3 low power state for idle, unused devices");
+
+static struct pci_driver vfio_mdev_pci_driver;
+
+struct vfio_mdev_pci_device {
+	struct vfio_pci_device vdev;
+	struct mdev_parent_ops ops;
+	struct attribute_group *groups[2];
+	struct attribute_group attr;
+	atomic_t avail;
+};
+
+static ssize_t
+available_instances_show(struct kobject *kobj, struct device *dev, char *buf)
+{
+	struct vfio_mdev_pci_device *vmdev;
+
+	vmdev = pci_get_drvdata(to_pci_dev(dev));
+
+	return sprintf(buf, "%d\n", atomic_read(&vmdev->avail));
+}
+
+MDEV_TYPE_ATTR_RO(available_instances);
+
+static ssize_t device_api_show(struct kobject *kobj, struct device *dev,
+		char *buf)
+{
+	return sprintf(buf, "%s\n", VFIO_DEVICE_API_PCI_STRING);
+}
+
+MDEV_TYPE_ATTR_RO(device_api);
+
+static struct attribute *vfio_mdev_pci_types_attrs[] = {
+	&mdev_type_attr_device_api.attr,
+	&mdev_type_attr_available_instances.attr,
+	NULL,
+};
+
+struct vfio_mdev_pci {
+	struct vfio_pci_device *vdev;
+	struct mdev_device *mdev;
+};
+
+static int vfio_mdev_pci_create(struct kobject *kobj, struct mdev_device *mdev)
+{
+	struct device *pdev;
+	struct vfio_mdev_pci_device *vmdev;
+	struct vfio_mdev_pci *pmdev;
+	int ret;
+
+	pdev = mdev_parent_dev(mdev);
+	vmdev = dev_get_drvdata(pdev);
+
+	if (atomic_dec_if_positive(&vmdev->avail) < 0)
+		return -ENOSPC;
+
+	pmdev = kzalloc(sizeof(struct vfio_mdev_pci), GFP_KERNEL);
+	if (!pmdev) {
+		atomic_inc(&vmdev->avail);
+		return -ENOMEM;
+	}
+
+	pmdev->mdev = mdev;
+	pmdev->vdev = &vmdev->vdev;
+	mdev_set_drvdata(mdev, pmdev);
+	ret = mdev_set_iommu_device(mdev_dev(mdev), pdev);
+	if (ret) {
+		pr_info("%s, failed to config iommu isolation for mdev: %s on pf: %s\n",
+			__func__, dev_name(mdev_dev(mdev)), dev_name(pdev));
+		kfree(pmdev);
+		atomic_inc(&vmdev->avail);
+		return ret;
+	}
+
+	pr_info("%s, creation succeeded for mdev: %s\n", __func__,
+		     dev_name(mdev_dev(mdev)));
+	return 0;
+}
+
+static int vfio_mdev_pci_remove(struct mdev_device *mdev)
+{
+	struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev);
+	struct vfio_mdev_pci_device *vmdev;
+
+	vmdev = container_of(pmdev->vdev, struct vfio_mdev_pci_device, vdev);
+
+	kfree(pmdev);
+	atomic_inc(&vmdev->avail);
+	pr_info("%s, succeeded for mdev: %s\n", __func__,
+		     dev_name(mdev_dev(mdev)));
+
+	return 0;
+}
+
+static int vfio_mdev_pci_open(struct mdev_device *mdev)
+{
+	struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev);
+	struct vfio_pci_device *vdev = pmdev->vdev;
+	int ret = 0;
+
+	if (!try_module_get(THIS_MODULE))
+		return -ENODEV;
+
+	vfio_pci_refresh_config(vdev, nointxmask, disable_idle_d3);
+
+	mutex_lock(&vdev->reflck->lock);
+
+	if (!vdev->refcnt) {
+		ret = vfio_pci_enable(vdev);
+		if (ret)
+			goto error;
+
+		vfio_spapr_pci_eeh_open(vdev->pdev);
+	}
+	vdev->refcnt++;
+error:
+	mutex_unlock(&vdev->reflck->lock);
+	if (!ret)
+		pr_info("Succeeded to open mdev: %s on pf: %s\n",
+		dev_name(mdev_dev(mdev)), dev_name(&pmdev->vdev->pdev->dev));
+	else {
+		pr_info("Failed to open mdev: %s on pf: %s\n",
+		dev_name(mdev_dev(mdev)), dev_name(&pmdev->vdev->pdev->dev));
+		module_put(THIS_MODULE);
+	}
+	return ret;
+}
+
+static void vfio_mdev_pci_release(struct mdev_device *mdev)
+{
+	struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev);
+	struct vfio_pci_device *vdev = pmdev->vdev;
+
+	pr_info("Release mdev: %s on pf: %s\n",
+		dev_name(mdev_dev(mdev)), dev_name(&pmdev->vdev->pdev->dev));
+
+	mutex_lock(&vdev->reflck->lock);
+
+	if (!(--vdev->refcnt)) {
+		vfio_spapr_pci_eeh_release(vdev->pdev);
+		vfio_pci_disable(vdev);
+	}
+
+	mutex_unlock(&vdev->reflck->lock);
+
+	module_put(THIS_MODULE);
+}
+
+static long vfio_mdev_pci_ioctl(struct mdev_device *mdev, unsigned int cmd,
+			     unsigned long arg)
+{
+	struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev);
+
+	return vfio_pci_ioctl(pmdev->vdev, cmd, arg);
+}
+
+static int vfio_mdev_pci_mmap(struct mdev_device *mdev,
+				struct vm_area_struct *vma)
+{
+	struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev);
+
+	return vfio_pci_mmap(pmdev->vdev, vma);
+}
+
+static ssize_t vfio_mdev_pci_read(struct mdev_device *mdev, char __user *buf,
+			size_t count, loff_t *ppos)
+{
+	struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev);
+
+	return vfio_pci_read(pmdev->vdev, buf, count, ppos);
+}
+
+static ssize_t vfio_mdev_pci_write(struct mdev_device *mdev,
+				const char __user *buf,
+				size_t count, loff_t *ppos)
+{
+	struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev);
+
+	return vfio_pci_write(pmdev->vdev, (char __user *)buf, count, ppos);
+}
+
+static int vfio_mdev_pci_driver_probe(struct pci_dev *pdev,
+				       const struct pci_device_id *id)
+{
+	struct vfio_mdev_pci_device *vmdev;
+	struct vfio_pci_device *vdev;
+	const struct mdev_parent_ops *ops;
+	int ret;
+
+	if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
+		return -EINVAL;
+
+	/*
+	 * Prevent binding to PFs with VFs enabled, this too easily allows
+	 * userspace instance with VFs and PFs from the same device, which
+	 * cannot work.  Disabling SR-IOV here would initiate removing the
+	 * VFs, which would unbind the driver, which is prone to blocking
+	 * if that VF is also in use by vfio-pci or vfio-mdev-pci. Just
+	 * reject these PFs and let the user sort it out.
+	 */
+	if (pci_num_vf(pdev)) {
+		pci_warn(pdev, "Cannot bind to PF with SR-IOV enabled\n");
+		return -EBUSY;
+	}
+
+	vmdev = kzalloc(sizeof(*vmdev), GFP_KERNEL);
+	if (!vmdev)
+		return -ENOMEM;
+
+	vmdev->attr.name = kasprintf(GFP_KERNEL,
+				     "%04x:%04x:%04x:%04x:%06x:%02x",
+				     pdev->vendor, pdev->device,
+				     pdev->subsystem_vendor,
+				     pdev->subsystem_device, pdev->class,
+				     pdev->revision);
+	if (!vmdev->attr.name) {
+		kfree(vmdev);
+		return -ENOMEM;
+	}
+
+	atomic_set(&vmdev->avail, 1);
+
+	vmdev->attr.attrs = vfio_mdev_pci_types_attrs;
+	vmdev->groups[0] = &vmdev->attr;
+
+	vmdev->ops.supported_type_groups = vmdev->groups;
+	vmdev->ops.create = vfio_mdev_pci_create;
+	vmdev->ops.remove = vfio_mdev_pci_remove;
+	vmdev->ops.open	= vfio_mdev_pci_open;
+	vmdev->ops.release = vfio_mdev_pci_release;
+	vmdev->ops.read = vfio_mdev_pci_read;
+	vmdev->ops.write = vfio_mdev_pci_write;
+	vmdev->ops.mmap = vfio_mdev_pci_mmap;
+	vmdev->ops.ioctl = vfio_mdev_pci_ioctl;
+	ops = &vmdev->ops;
+
+	vdev = &vmdev->vdev;
+	vdev->pdev = pdev;
+	vdev->irq_type = VFIO_PCI_NUM_IRQS;
+	mutex_init(&vdev->igate);
+	spin_lock_init(&vdev->irqlock);
+	mutex_init(&vdev->ioeventfds_lock);
+	INIT_LIST_HEAD(&vdev->ioeventfds_list);
+	vdev->nointxmask = nointxmask;
+#ifdef CONFIG_VFIO_PCI_VGA
+	vdev->disable_vga = disable_vga;
+#endif
+	vdev->disable_idle_d3 = disable_idle_d3;
+
+	pci_set_drvdata(pdev, vmdev);
+
+	ret = vfio_pci_reflck_attach(vdev);
+	if (ret) {
+		pci_set_drvdata(pdev, NULL);
+		kfree(vdev);
+		return ret;
+	}
+
+	if (vfio_pci_is_vga(pdev)) {
+		vga_client_register(pdev, vdev, NULL, vfio_pci_set_vga_decode);
+		vga_set_legacy_decoding(pdev,
+					vfio_pci_set_vga_decode(vdev, false));
+	}
+
+	vfio_pci_probe_power_state(vdev);
+
+	if (!vdev->disable_idle_d3) {
+		/*
+		 * pci-core sets the device power state to an unknown value at
+		 * bootup and after being removed from a driver.  The only
+		 * transition it allows from this unknown state is to D0, which
+		 * typically happens when a driver calls pci_enable_device().
+		 * We're not ready to enable the device yet, but we do want to
+		 * be able to get to D3.  Therefore first do a D0 transition
+		 * before going to D3.
+		 */
+		vfio_pci_set_power_state(vdev, PCI_D0);
+		vfio_pci_set_power_state(vdev, PCI_D3hot);
+	}
+
+	ret = mdev_register_device(&pdev->dev, ops);
+	if (ret)
+		pr_err("Cannot register mdev for device %s\n",
+			dev_name(&pdev->dev));
+	else
+		pr_info("Wrap device %s as a mdev\n", dev_name(&pdev->dev));
+
+	return ret;
+}
+
+static void vfio_mdev_pci_driver_remove(struct pci_dev *pdev)
+{
+	struct vfio_mdev_pci_device *vmdev;
+	struct vfio_pci_device *vdev;
+
+	mdev_unregister_device(&pdev->dev);
+
+	vmdev = pci_get_drvdata(pdev);
+	if (!vmdev)
+		return;
+
+	vdev = &vmdev->vdev;
+
+	vfio_pci_reflck_put(vdev->reflck);
+
+	kfree(vdev->region);
+	mutex_destroy(&vdev->ioeventfds_lock);
+
+	if (!disable_idle_d3)
+		vfio_pci_set_power_state(vdev, PCI_D0);
+
+	kfree(vdev->pm_save);
+
+	if (vfio_pci_is_vga(pdev)) {
+		vga_client_register(pdev, NULL, NULL, NULL);
+		vga_set_legacy_decoding(pdev,
+				VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM |
+				VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM);
+	}
+
+	kfree(vmdev->attr.name);
+	kfree(vmdev);
+}
+
+static struct pci_driver vfio_mdev_pci_driver = {
+	.name		= VFIO_MDEV_PCI_NAME,
+	.id_table	= NULL, /* only dynamic ids */
+	.probe		= vfio_mdev_pci_driver_probe,
+	.remove		= vfio_mdev_pci_driver_remove,
+	.err_handler	= &vfio_pci_err_handlers,
+};
+
+static void __exit vfio_mdev_pci_cleanup(void)
+{
+	pci_unregister_driver(&vfio_mdev_pci_driver);
+}
+
+static int __init vfio_mdev_pci_init(void)
+{
+	int ret;
+
+	/* Register and scan for devices */
+	ret = pci_register_driver(&vfio_mdev_pci_driver);
+	if (ret)
+		return ret;
+
+	vfio_pci_fill_ids(ids, &vfio_mdev_pci_driver);
+
+	return 0;
+}
+
+module_init(vfio_mdev_pci_init);
+module_exit(vfio_mdev_pci_cleanup);
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* RE: [PATCH v3 03/10] vfio_pci: refine vfio_pci_driver reference in vfio_pci.c
  2019-12-15 22:46   ` Alex Williamson
@ 2019-12-16 11:59     ` Liu, Yi L
  0 siblings, 0 replies; 18+ messages in thread
From: Liu, Yi L @ 2019-12-16 11:59 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kwankhede, Tian, Kevin, baolu.lu, Sun, Yi Y, joro,
	jean-philippe.brucker, peterx, linux-kernel, kvm

> From: Alex Williamson < alex.williamson@redhat.com>
> Sent: Monday, December 16, 2019 6:47 AM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [PATCH v3 03/10] vfio_pci: refine vfio_pci_driver reference in vfio_pci.c
> 
> On Thu, 21 Nov 2019 19:23:40 +0800
> Liu Yi L <yi.l.liu@intel.com> wrote:
> 
> > This patch replaces the vfio_pci_driver reference in vfio_pci.c with
> > pci_dev_driver(vdev->pdev) which is more helpful to make the functions
> > be generic to module types.
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Cc: Lu Baolu <baolu.lu@linux.intel.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/pci/vfio_pci.c | 33 ++++++++++++++++++---------------
> >  1 file changed, 18 insertions(+), 15 deletions(-)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > index b04e43a..2096e66 100644
> > --- a/drivers/vfio/pci/vfio_pci.c
> > +++ b/drivers/vfio/pci/vfio_pci.c
> > @@ -1460,24 +1460,25 @@ static void vfio_pci_reflck_get(struct vfio_pci_reflck
> *reflck)
> >
> >  static int vfio_pci_reflck_find(struct pci_dev *pdev, void *data)
> >  {
> > -	struct vfio_pci_reflck **preflck = data;
> > +	struct vfio_pci_device *vdev = data;
> > +	struct vfio_pci_reflck **preflck = &vdev->reflck;
> >  	struct vfio_device *device;
> > -	struct vfio_pci_device *vdev;
> > +	struct vfio_pci_device *tmp;
> >
> >  	device = vfio_device_get_from_dev(&pdev->dev);
> >  	if (!device)
> >  		return 0;
> >
> > -	if (pci_dev_driver(pdev) != &vfio_pci_driver) {
> > +	if (pci_dev_driver(pdev) != pci_dev_driver(vdev->pdev)) {
> >  		vfio_device_put(device);
> >  		return 0;
> >  	}
> >
> > -	vdev = vfio_device_data(device);
> > +	tmp = vfio_device_data(device);
> >
> > -	if (vdev->reflck) {
> > -		vfio_pci_reflck_get(vdev->reflck);
> > -		*preflck = vdev->reflck;
> > +	if (tmp->reflck) {
> > +		vfio_pci_reflck_get(tmp->reflck);
> > +		*preflck = tmp->reflck;
> >  		vfio_device_put(device);
> >  		return 1;
> >  	}
> > @@ -1494,7 +1495,7 @@ static int vfio_pci_reflck_attach(struct vfio_pci_device
> *vdev)
> >
> >  	if (pci_is_root_bus(vdev->pdev->bus) ||
> >  	    vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_reflck_find,
> > -					  &vdev->reflck, slot) <= 0)
> > +					  vdev, slot) <= 0)
> >  		vdev->reflck = vfio_pci_reflck_alloc();
> >
> >  	mutex_unlock(&reflck_lock);
> > @@ -1519,6 +1520,7 @@ static void vfio_pci_reflck_put(struct vfio_pci_reflck
> *reflck)
> >
> >  struct vfio_devices {
> >  	struct vfio_device **devices;
> > +	struct vfio_pci_device *vdev;
> >  	int cur_index;
> >  	int max_index;
> >  };
> > @@ -1527,7 +1529,7 @@ static int vfio_pci_get_unused_devs(struct pci_dev
> *pdev, void *data)
> >  {
> >  	struct vfio_devices *devs = data;
> >  	struct vfio_device *device;
> > -	struct vfio_pci_device *vdev;
> > +	struct vfio_pci_device *tmp;
> >
> >  	if (devs->cur_index == devs->max_index)
> >  		return -ENOSPC;
> > @@ -1536,15 +1538,15 @@ static int vfio_pci_get_unused_devs(struct pci_dev
> *pdev, void *data)
> >  	if (!device)
> >  		return -EINVAL;
> >
> > -	if (pci_dev_driver(pdev) != &vfio_pci_driver) {
> > +	if (pci_dev_driver(pdev) != pci_dev_driver(devs->vdev->pdev)) {
> >  		vfio_device_put(device);
> >  		return -EBUSY;
> >  	}
> >
> > -	vdev = vfio_device_data(device);
> > +	tmp = vfio_device_data(device);
> >
> >  	/* Fault if the device is not unused */
> > -	if (vdev->refcnt) {
> > +	if (tmp->refcnt) {
> >  		vfio_device_put(device);
> >  		return -EBUSY;
> >  	}
> > @@ -1590,6 +1592,7 @@ static void vfio_pci_try_bus_reset(struct vfio_pci_device
> *vdev)
> >  	if (!devs.devices)
> >  		return;
> >
> > +	devs.vdev = vdev;
> 
> This could be added to the declaration initializer:
> 
> struct vfio_devices devs = { .vdev = vdev, .cur_index = 0 };
> 
> It might seem a little less random then.  Thanks,

Got it. Will do it.

Thanks,
Yi Liu


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 08/10] vfio/pci: protect cap/ecap_perm bits alloc/free
  2019-12-16 11:57     ` Liu, Yi L
@ 2019-12-16 17:42       ` Alex Williamson
  2020-01-06  5:29         ` Liu, Yi L
  0 siblings, 1 reply; 18+ messages in thread
From: Alex Williamson @ 2019-12-16 17:42 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: kwankhede, Tian, Kevin, baolu.lu, Sun, Yi Y, joro,
	jean-philippe.brucker, peterx, linux-kernel, kvm

On Mon, 16 Dec 2019 11:57:54 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > Sent: Monday, December 16, 2019 6:47 AM
> > To: Liu, Yi L <yi.l.liu@intel.com>
> > Subject: Re: [PATCH v3 08/10] vfio/pci: protect cap/ecap_perm bits alloc/free
> > 
> > On Thu, 21 Nov 2019 19:23:45 +0800
> > Liu Yi L <yi.l.liu@intel.com> wrote:
> >   
> > > This patch add a user numer track for the shared cap/ecap_perms bits,
> > > and the alloc/free will hold a semaphore to protect the operations.
> > > With the changes, first caller of vfio_pci_init_perm_bits() will
> > > initialize the bits. While the last caller of vfio_pci_uninit_perm_bits()
> > > will free the bits. This is a preparation to have multiple cap/ecap_perms
> > > bits users.
> > >
> > > Cc: Kevin Tian <kevin.tian@intel.com>
> > > Cc: Lu Baolu <baolu.lu@linux.intel.com>
> > > Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > > ---
> > >  drivers/vfio/pci/vfio_pci_config.c | 33 +++++++++++++++++++++++++++++++--
> > >  1 file changed, 31 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
> > > index f0891bd..274c993 100644
> > > --- a/drivers/vfio/pci/vfio_pci_config.c
> > > +++ b/drivers/vfio/pci/vfio_pci_config.c
> > > @@ -36,6 +36,13 @@
> > >  	 (offset >= PCI_ROM_ADDRESS && offset < PCI_ROM_ADDRESS + 4))
> > >
> > >  /*
> > > + * vfio_perm_bits_sem: prorects the shared perm_bits alloc/free
> > > + * vfio_pci_perm_bits_users: tracks the user of the shared perm_bits
> > > + */
> > > +static DEFINE_SEMAPHORE(vfio_perm_bits_sem);
> > > +static int vfio_pci_perm_bits_users;
> > > +
> > > +/*
> > >   * Lengths of PCI Config Capabilities
> > >   *   0: Removed from the user visible capability list
> > >   *   FF: Variable length
> > > @@ -995,7 +1002,7 @@ static int __init init_pci_ext_cap_pwr_perm(struct  
> > perm_bits *perm)  
> > >  /*
> > >   * Initialize the shared permission tables
> > >   */
> > > -void vfio_pci_uninit_perm_bits(void)
> > > +static void vfio_pci_uninit_perm_bits_internal(void)
> > >  {
> > >  	free_perm_bits(&cap_perms[PCI_CAP_ID_BASIC]);
> > >
> > > @@ -1009,10 +1016,30 @@ void vfio_pci_uninit_perm_bits(void)
> > >  	free_perm_bits(&ecap_perms[PCI_EXT_CAP_ID_PWR]);
> > >  }
> > >
> > > +void vfio_pci_uninit_perm_bits(void)
> > > +{
> > > +	down(&vfio_perm_bits_sem);
> > > +
> > > +	if (--vfio_pci_perm_bits_users > 0)
> > > +		goto out;
> > > +
> > > +	vfio_pci_uninit_perm_bits_internal();
> > > +
> > > +out:
> > > +	up(&vfio_perm_bits_sem);
> > > +}
> > > +
> > >  int __init vfio_pci_init_perm_bits(void)
> > >  {
> > >  	int ret;
> > >
> > > +	down(&vfio_perm_bits_sem);
> > > +
> > > +	if (++vfio_pci_perm_bits_users > 1) {
> > > +		ret = 0;
> > > +		goto out;
> > > +	}
> > > +
> > >  	/* Basic config space */
> > >  	ret = init_pci_cap_basic_perm(&cap_perms[PCI_CAP_ID_BASIC]);
> > >
> > > @@ -1030,8 +1057,10 @@ int __init vfio_pci_init_perm_bits(void)
> > >  	ecap_perms[PCI_EXT_CAP_ID_VNDR].writefn = vfio_raw_config_write;
> > >
> > >  	if (ret)
> > > -		vfio_pci_uninit_perm_bits();
> > > +		vfio_pci_uninit_perm_bits_internal();
> > >
> > > +out:
> > > +	up(&vfio_perm_bits_sem);
> > >  	return ret;
> > >  }
> > >  
> > 
> > Hi Yi,
> > 
> > Sorry for slowness in providing feedback on this series.  If we
> > provided a vfio-pci-common module that vfio-pci and vfio-mdev-pci
> > depend on, doesn't this entire problem go away?   
> 
> I checked previous email, export the common functions out of
> vfio-pci module was proposed in RFC v2. But at that time, I didn't
> propose to have a separate module. So I guess it may be just correct
> way. Below is the reply at that time.
> 
> https://lkml.org/lkml/2019/3/19/756
> 
> > I played a little bit
> > with this in the crude patch below, it seems to work.  To finish this,
> > I think we'd move the function declarations out of the "private" header
> > file and into one under include/linux, then we could also move
> > vfio_mdev_pci.c to the samples directory like we intended originally.
> > I know you had tried to link things from samples and it didn't work,
> > but is the below a better attempt at resolving this?  It commits us to
> > exporting a bunch of functions, we'll need to decide whether that's a
> > good idea.  Thanks,  
> 
> I played with your patch and added a crude patch to move
> vfio-mdev-pci to samples directory. I can see below errors with
> either CONFIG_VFIO_PCI_COMMON=y and
> CONFIG_SAMPLE_VFIO_MDEV_PCI=y, or
> CONFIG_VFIO_PCI_COMMON=m and
> CONFIG_SAMPLE_VFIO_MDEV_PCI=m.
> 
> But CONFIG_VFIO_PCI_COMMON works well with CONFIG_VFIO_PCI...
> 
> Kernel: arch/x86/boot/bzImage is ready  (#88)
> ERROR: "vfio_pci_fill_ids" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
> ERROR: "vfio_pci_err_handlers" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
> ERROR: "vfio_pci_reflck_attach" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
> ERROR: "vfio_pci_write" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
> ERROR: "vfio_pci_disable" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
> ERROR: "vfio_pci_reflck_put" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
> ERROR: "vfio_pci_refresh_config" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
> ERROR: "vfio_pci_set_power_state" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
> ERROR: "vfio_pci_enable" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
> ERROR: "vfio_pci_set_vga_decode" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
> ERROR: "vfio_pci_probe_power_state" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
> ERROR: "vfio_pci_mmap" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
> ERROR: "vfio_pci_ioctl" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
> ERROR: "vfio_pci_read" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
> scripts/Makefile.modpost:93: recipe for target '__modpost' failed
> 
> So I'm afraid that it still cannot resolve the problem which we encountered
> when trying to place vfio-mdev-pci in samples/.

I'm not sure what we're doing differently (with your patch applied):

$ grep VFIO ~/build/.config | grep -v ^#
CONFIG_KVM_VFIO=y
CONFIG_VFIO_IOMMU_TYPE1=m
CONFIG_VFIO_VIRQFD=m
CONFIG_VFIO=m
CONFIG_VFIO_PCI_COMMON=m
CONFIG_VFIO_PCI=m
CONFIG_VFIO_PCI_VGA=y
CONFIG_VFIO_PCI_MMAP=y
CONFIG_VFIO_PCI_INTX=y
CONFIG_VFIO_PCI_IGD=y
CONFIG_VFIO_MDEV=m
CONFIG_VFIO_MDEV_DEVICE=m
CONFIG_SAMPLE_VFIO_MDEV_PCI=m

$ make O=~/build -j36 > /dev/null && echo $?
0

$ grep VFIO ~/build/.config | grep -v ^#
CONFIG_KVM_VFIO=y
CONFIG_VFIO_IOMMU_TYPE1=y
CONFIG_VFIO_VIRQFD=y
CONFIG_VFIO=y
CONFIG_VFIO_PCI_COMMON=y
CONFIG_VFIO_PCI=y
CONFIG_VFIO_PCI_VGA=y
CONFIG_VFIO_PCI_MMAP=y
CONFIG_VFIO_PCI_INTX=y
CONFIG_VFIO_PCI_IGD=y
CONFIG_VFIO_MDEV=m
CONFIG_VFIO_MDEV_DEVICE=m
$ make O=~/build -j36 > /dev/null && echo $?
0

$ grep VFIO ~/build/.config | grep -v ^#
CONFIG_KVM_VFIO=y
CONFIG_VFIO_IOMMU_TYPE1=y
CONFIG_VFIO_VIRQFD=m
CONFIG_VFIO=y
CONFIG_VFIO_PCI_COMMON=m
CONFIG_VFIO_PCI=m
CONFIG_VFIO_PCI_VGA=y
CONFIG_VFIO_PCI_MMAP=y
CONFIG_VFIO_PCI_INTX=y
CONFIG_VFIO_PCI_IGD=y
CONFIG_VFIO_MDEV=m
CONFIG_VFIO_MDEV_DEVICE=m
$ make O=~/build -j36 > /dev/null && echo $?
0

> ================= [the crude patch] =================
> From fa860ff15ab188481141f7bd2b9cb3a1d500f24d Mon Sep 17 00:00:00 2001
> From: Liu Yi L <yi.l.liu@intel.com>
> Date: Sun, 15 Dec 2019 19:16:54 +0800
> Subject: [PATCH] vfio/pci/sample: move vfio-pci-mdev to samples
> 
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> ---
>  drivers/vfio/pci/Makefile             |   2 -
>  drivers/vfio/pci/vfio_mdev_pci.c      | 421 ----------------------------------
>  drivers/vfio/pci/vfio_pci.c           |   2 +-
>  drivers/vfio/pci/vfio_pci_common.c    |   2 +-
>  drivers/vfio/pci/vfio_pci_config.c    |   2 +-
>  drivers/vfio/pci/vfio_pci_igd.c       |   2 +-
>  drivers/vfio/pci/vfio_pci_intrs.c     |   2 +-
>  drivers/vfio/pci/vfio_pci_nvlink2.c   |   2 +-
>  drivers/vfio/pci/vfio_pci_private.h   | 228 ------------------
>  drivers/vfio/pci/vfio_pci_rdwr.c      |   2 +-
>  include/linux/vfio_pci_private.h      | 228 ++++++++++++++++++

Of course this would not be "private" if this is the direction we
decide to go.  Potentially there are still things private to
vfio-pci-common that we'd leave in the drivers directory though.  I'm
not entirely thrilled to expose the objects outside of vfio-pci, but
potentially if vfio-pci had a mechanism to choose an alternate
vfio_device_ops that provided vendor mediation extension when calling
vfio_add_group_dev(), and common code was sharable to vendor drivers,
it might clean up the interface Yan is proposing for adding migration
to non-mdev devices.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v3 08/10] vfio/pci: protect cap/ecap_perm bits alloc/free
  2019-12-16 17:42       ` Alex Williamson
@ 2020-01-06  5:29         ` Liu, Yi L
  0 siblings, 0 replies; 18+ messages in thread
From: Liu, Yi L @ 2020-01-06  5:29 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kwankhede, Tian, Kevin, baolu.lu, Sun, Yi Y, joro,
	jean-philippe.brucker, peterx, linux-kernel, kvm

> From: Alex Williamson < alex.williamson@redhat.com>
> Sent: Tuesday, December 17, 2019 1:43 AM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [PATCH v3 08/10] vfio/pci: protect cap/ecap_perm bits alloc/free
> 
> On Mon, 16 Dec 2019 11:57:54 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> 
> > > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > > Sent: Monday, December 16, 2019 6:47 AM
> > > To: Liu, Yi L <yi.l.liu@intel.com>
> > > Subject: Re: [PATCH v3 08/10] vfio/pci: protect cap/ecap_perm bits alloc/free
> > >
> > > On Thu, 21 Nov 2019 19:23:45 +0800
> > > Liu Yi L <yi.l.liu@intel.com> wrote:
> > >
> > > > This patch add a user numer track for the shared cap/ecap_perms bits,
> > > > and the alloc/free will hold a semaphore to protect the operations.
> > > > With the changes, first caller of vfio_pci_init_perm_bits() will
> > > > initialize the bits. While the last caller of vfio_pci_uninit_perm_bits()
> > > > will free the bits. This is a preparation to have multiple cap/ecap_perms
> > > > bits users.
> > > >
> > > > Cc: Kevin Tian <kevin.tian@intel.com>
> > > > Cc: Lu Baolu <baolu.lu@linux.intel.com>
> > > > Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> > > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > > > ---
> > > >  drivers/vfio/pci/vfio_pci_config.c | 33 +++++++++++++++++++++++++++++++--
> > > >  1 file changed, 31 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/vfio/pci/vfio_pci_config.c
> b/drivers/vfio/pci/vfio_pci_config.c
> > > > index f0891bd..274c993 100644
> > > > --- a/drivers/vfio/pci/vfio_pci_config.c
> > > > +++ b/drivers/vfio/pci/vfio_pci_config.c
> > > > @@ -36,6 +36,13 @@
> > > >  	 (offset >= PCI_ROM_ADDRESS && offset < PCI_ROM_ADDRESS + 4))
> > > >
> > > >  /*
> > > > + * vfio_perm_bits_sem: prorects the shared perm_bits alloc/free
> > > > + * vfio_pci_perm_bits_users: tracks the user of the shared perm_bits
> > > > + */
> > > > +static DEFINE_SEMAPHORE(vfio_perm_bits_sem);
> > > > +static int vfio_pci_perm_bits_users;
> > > > +
> > > > +/*
> > > >   * Lengths of PCI Config Capabilities
> > > >   *   0: Removed from the user visible capability list
> > > >   *   FF: Variable length
> > > > @@ -995,7 +1002,7 @@ static int __init init_pci_ext_cap_pwr_perm(struct
> > > perm_bits *perm)
> > > >  /*
> > > >   * Initialize the shared permission tables
> > > >   */
> > > > -void vfio_pci_uninit_perm_bits(void)
> > > > +static void vfio_pci_uninit_perm_bits_internal(void)
> > > >  {
> > > >  	free_perm_bits(&cap_perms[PCI_CAP_ID_BASIC]);
> > > >
> > > > @@ -1009,10 +1016,30 @@ void vfio_pci_uninit_perm_bits(void)
> > > >  	free_perm_bits(&ecap_perms[PCI_EXT_CAP_ID_PWR]);
> > > >  }
> > > >
> > > > +void vfio_pci_uninit_perm_bits(void)
> > > > +{
> > > > +	down(&vfio_perm_bits_sem);
> > > > +
> > > > +	if (--vfio_pci_perm_bits_users > 0)
> > > > +		goto out;
> > > > +
> > > > +	vfio_pci_uninit_perm_bits_internal();
> > > > +
> > > > +out:
> > > > +	up(&vfio_perm_bits_sem);
> > > > +}
> > > > +
> > > >  int __init vfio_pci_init_perm_bits(void)
> > > >  {
> > > >  	int ret;
> > > >
> > > > +	down(&vfio_perm_bits_sem);
> > > > +
> > > > +	if (++vfio_pci_perm_bits_users > 1) {
> > > > +		ret = 0;
> > > > +		goto out;
> > > > +	}
> > > > +
> > > >  	/* Basic config space */
> > > >  	ret = init_pci_cap_basic_perm(&cap_perms[PCI_CAP_ID_BASIC]);
> > > >
> > > > @@ -1030,8 +1057,10 @@ int __init vfio_pci_init_perm_bits(void)
> > > >  	ecap_perms[PCI_EXT_CAP_ID_VNDR].writefn = vfio_raw_config_write;
> > > >
> > > >  	if (ret)
> > > > -		vfio_pci_uninit_perm_bits();
> > > > +		vfio_pci_uninit_perm_bits_internal();
> > > >
> > > > +out:
> > > > +	up(&vfio_perm_bits_sem);
> > > >  	return ret;
> > > >  }
> > > >
> > >
> > > Hi Yi,
> > >
> > > Sorry for slowness in providing feedback on this series.  If we
> > > provided a vfio-pci-common module that vfio-pci and vfio-mdev-pci
> > > depend on, doesn't this entire problem go away?
> >
> > I checked previous email, export the common functions out of
> > vfio-pci module was proposed in RFC v2. But at that time, I didn't
> > propose to have a separate module. So I guess it may be just correct
> > way. Below is the reply at that time.
> >
> > https://lkml.org/lkml/2019/3/19/756
> >
> > > I played a little bit
> > > with this in the crude patch below, it seems to work.  To finish this,
> > > I think we'd move the function declarations out of the "private" header
> > > file and into one under include/linux, then we could also move
> > > vfio_mdev_pci.c to the samples directory like we intended originally.
> > > I know you had tried to link things from samples and it didn't work,
> > > but is the below a better attempt at resolving this?  It commits us to
> > > exporting a bunch of functions, we'll need to decide whether that's a
> > > good idea.  Thanks,
> >
> > I played with your patch and added a crude patch to move
> > vfio-mdev-pci to samples directory. I can see below errors with
> > either CONFIG_VFIO_PCI_COMMON=y and
> > CONFIG_SAMPLE_VFIO_MDEV_PCI=y, or
> > CONFIG_VFIO_PCI_COMMON=m and
> > CONFIG_SAMPLE_VFIO_MDEV_PCI=m.
> >
> > But CONFIG_VFIO_PCI_COMMON works well with CONFIG_VFIO_PCI...
> >
> > Kernel: arch/x86/boot/bzImage is ready  (#88)
> > ERROR: "vfio_pci_fill_ids" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
> > ERROR: "vfio_pci_err_handlers" [samples/vfio-mdev-pci/vfio-mdev-pci.ko]
> undefined!
> > ERROR: "vfio_pci_reflck_attach" [samples/vfio-mdev-pci/vfio-mdev-pci.ko]
> undefined!
> > ERROR: "vfio_pci_write" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
> > ERROR: "vfio_pci_disable" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
> > ERROR: "vfio_pci_reflck_put" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
> > ERROR: "vfio_pci_refresh_config" [samples/vfio-mdev-pci/vfio-mdev-pci.ko]
> undefined!
> > ERROR: "vfio_pci_set_power_state" [samples/vfio-mdev-pci/vfio-mdev-pci.ko]
> undefined!
> > ERROR: "vfio_pci_enable" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
> > ERROR: "vfio_pci_set_vga_decode" [samples/vfio-mdev-pci/vfio-mdev-pci.ko]
> undefined!
> > ERROR: "vfio_pci_probe_power_state" [samples/vfio-mdev-pci/vfio-mdev-pci.ko]
> undefined!
> > ERROR: "vfio_pci_mmap" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
> > ERROR: "vfio_pci_ioctl" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
> > ERROR: "vfio_pci_read" [samples/vfio-mdev-pci/vfio-mdev-pci.ko] undefined!
> > scripts/Makefile.modpost:93: recipe for target '__modpost' failed
> >
> > So I'm afraid that it still cannot resolve the problem which we encountered
> > when trying to place vfio-mdev-pci in samples/.
> 
> I'm not sure what we're doing differently (with your patch applied):
> 
> $ grep VFIO ~/build/.config | grep -v ^#
> CONFIG_KVM_VFIO=y
> CONFIG_VFIO_IOMMU_TYPE1=m
> CONFIG_VFIO_VIRQFD=m
> CONFIG_VFIO=m
> CONFIG_VFIO_PCI_COMMON=m
> CONFIG_VFIO_PCI=m
> CONFIG_VFIO_PCI_VGA=y
> CONFIG_VFIO_PCI_MMAP=y
> CONFIG_VFIO_PCI_INTX=y
> CONFIG_VFIO_PCI_IGD=y
> CONFIG_VFIO_MDEV=m
> CONFIG_VFIO_MDEV_DEVICE=m
> CONFIG_SAMPLE_VFIO_MDEV_PCI=m

Hi Alex,

When facing the error in previous emai, I'm using the below
.config. I didn't compile VFIO_PCI. This may be the major difference.
And I think I found the root cause. When trying to build vfio_mdev_pci
into kernel image, it requires vfio_pci_common.ko. However, if VFIO_PCI
is not configured, the drivers/vfio/pci directory is not built. It can be fixed
by either add " obj-$(CONFIG_VFIO_PCI_COMMON) += pci/"
in drivers/vfio/Makefile or add " select VFIO_PCI" for SAMPLE_VFIO_MDEV_PCI
in samples/Kconfig. So I'll move the vfio_mdev_pci module to samples as
you suggested. Let me cook another version. :-)

yiliu@yiliu-dev:~/vfio-mdev-pci-driver/linux$ grep VFIO .config | grep -v ^#
CONFIG_KVM_VFIO=y
CONFIG_VFIO_IOMMU_TYPE1=y
CONFIG_VFIO_VIRQFD=m
CONFIG_VFIO=y
CONFIG_VFIO_PCI_COMMON=m
CONFIG_VFIO_MDEV=y
CONFIG_VFIO_MDEV_DEVICE=y
CONFIG_SAMPLE_VFIO_MDEV_PCI=m

> $ make O=~/build -j36 > /dev/null && echo $?
> 0
> 
> $ grep VFIO ~/build/.config | grep -v ^#
> CONFIG_KVM_VFIO=y
> CONFIG_VFIO_IOMMU_TYPE1=y
> CONFIG_VFIO_VIRQFD=y
> CONFIG_VFIO=y
> CONFIG_VFIO_PCI_COMMON=y
> CONFIG_VFIO_PCI=y
> CONFIG_VFIO_PCI_VGA=y
> CONFIG_VFIO_PCI_MMAP=y
> CONFIG_VFIO_PCI_INTX=y
> CONFIG_VFIO_PCI_IGD=y
> CONFIG_VFIO_MDEV=m
> CONFIG_VFIO_MDEV_DEVICE=m
> $ make O=~/build -j36 > /dev/null && echo $?
> 0
> 
> $ grep VFIO ~/build/.config | grep -v ^#
> CONFIG_KVM_VFIO=y
> CONFIG_VFIO_IOMMU_TYPE1=y
> CONFIG_VFIO_VIRQFD=m
> CONFIG_VFIO=y
> CONFIG_VFIO_PCI_COMMON=m
> CONFIG_VFIO_PCI=m
> CONFIG_VFIO_PCI_VGA=y
> CONFIG_VFIO_PCI_MMAP=y
> CONFIG_VFIO_PCI_INTX=y
> CONFIG_VFIO_PCI_IGD=y
> CONFIG_VFIO_MDEV=m
> CONFIG_VFIO_MDEV_DEVICE=m
> $ make O=~/build -j36 > /dev/null && echo $?
> 0
> 
> > ================= [the crude patch] =================
> > From fa860ff15ab188481141f7bd2b9cb3a1d500f24d Mon Sep 17 00:00:00 2001
> > From: Liu Yi L <yi.l.liu@intel.com>
> > Date: Sun, 15 Dec 2019 19:16:54 +0800
> > Subject: [PATCH] vfio/pci/sample: move vfio-pci-mdev to samples
> >
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/pci/Makefile             |   2 -
> >  drivers/vfio/pci/vfio_mdev_pci.c      | 421 ----------------------------------
> >  drivers/vfio/pci/vfio_pci.c           |   2 +-
> >  drivers/vfio/pci/vfio_pci_common.c    |   2 +-
> >  drivers/vfio/pci/vfio_pci_config.c    |   2 +-
> >  drivers/vfio/pci/vfio_pci_igd.c       |   2 +-
> >  drivers/vfio/pci/vfio_pci_intrs.c     |   2 +-
> >  drivers/vfio/pci/vfio_pci_nvlink2.c   |   2 +-
> >  drivers/vfio/pci/vfio_pci_private.h   | 228 ------------------
> >  drivers/vfio/pci/vfio_pci_rdwr.c      |   2 +-
> >  include/linux/vfio_pci_private.h      | 228 ++++++++++++++++++
> 
> Of course this would not be "private" if this is the direction we
> decide to go. 

yes, if we are going this way in the end, it should definitely be split into
a "private" one and non "private" one.

> Potentially there are still things private to
> vfio-pci-common that we'd leave in the drivers directory though.  I'm
> not entirely thrilled to expose the objects outside of vfio-pci,

Will have a private .h to vfio-pci-common.ko :-)

> but
> potentially if vfio-pci had a mechanism to choose an alternate
> vfio_device_ops that provided vendor mediation extension when calling
> vfio_add_group_dev(), and common code was sharable to vendor drivers,
> it might clean up the interface Yan is proposing for adding migration
> to non-mdev devices.  Thanks,

I think it is doable as long as common code are sharable to vendor drivers.
Or maybe vendor driver just choose to consume vfio pci common codes.
Thanks for your suggestions, let me work out another version.

> 
> Alex

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-01-06  5:29 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-21 11:23 [PATCH v3 00/10] vfio_pci: wrap pci device as a mediated device Liu Yi L
2019-11-21 11:23 ` [PATCH v3 01/10] vfio_pci: move vfio_pci_is_vga/vfio_vga_disabled to header Liu Yi L
2019-11-21 11:23 ` [PATCH v3 02/10] vfio_pci: refine user config reference in vfio-pci module Liu Yi L
2019-11-21 11:23 ` [PATCH v3 03/10] vfio_pci: refine vfio_pci_driver reference in vfio_pci.c Liu Yi L
2019-12-15 22:46   ` Alex Williamson
2019-12-16 11:59     ` Liu, Yi L
2019-11-21 11:23 ` [PATCH v3 04/10] vfio_pci: make common functions be extern Liu Yi L
2019-11-21 11:23 ` [PATCH v3 05/10] vfio_pci: duplicate vfio_pci.c Liu Yi L
2019-11-21 11:23 ` [PATCH v3 06/10] vfio_pci: shrink vfio_pci_common.c Liu Yi L
2019-11-21 11:23 ` [PATCH v3 07/10] vfio_pci: shrink vfio_pci.c Liu Yi L
2019-11-21 11:23 ` [PATCH v3 08/10] vfio/pci: protect cap/ecap_perm bits alloc/free Liu Yi L
2019-12-15 22:46   ` Alex Williamson
2019-12-16 11:57     ` Liu, Yi L
2019-12-16 17:42       ` Alex Williamson
2020-01-06  5:29         ` Liu, Yi L
2019-11-21 11:23 ` [PATCH v3 09/10] samples: add vfio-mdev-pci driver Liu Yi L
2019-11-21 11:23 ` [PATCH v3 10/10] samples: refine " Liu Yi L
2019-12-09 11:49 ` [PATCH v3 00/10] vfio_pci: wrap pci device as a mediated device Liu, Yi L

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).