All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] VFIO SR-IOV support
@ 2016-06-09 12:09 Ilya Lesokhin
  2016-06-09 12:09 ` [PATCH 1/4] VFIO: Probe new devices in a live VFIO group with the VFIO driver Ilya Lesokhin
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Ilya Lesokhin @ 2016-06-09 12:09 UTC (permalink / raw)
  To: kvm, linux-pci
  Cc: bhelgaas, alex.williamson, noaos, haggaie, ogerlitz, liranl, ilyal

Changes from RFC V2:
        1. pci_disable_sriov() is now called from a workqueue
        To avoid the situation where a process is blocked
        in pci_disable_sriov() wating for itself to relase the VFs.
        2. a mutex was added to synchronize calls to
        pci_enable_sriov() and pci_disable_sriov()

Changes from RFC V1:
        Due to the security concern raised in RFC V1, we add two patches
        to make sure the VFs belong to the same IOMMU group as
        the PF and are probed by VFIO.

Today the QEMU hypervisor allows assigning a physical device to a VM,
facilitating driver development. However, it does not support enabling
SR-IOV by the VM kernel driver. Our goal is to implement such support,
allowing developers working on SR-IOV physical function drivers to work
inside VMs as well.

This patch series implements the kernel side of our solution.  It extends
the VFIO driver to support the PCIE SRIOV extended capability with
following features:
1. The ability to probe SR-IOV BAR sizes.
2. The ability to enable and disable SR-IOV.

This patch series is going to be used by QEMU to expose SR-IOV capabilities
to VM. We already have an early prototype based on Knut Omang's patches for
SR-IOV[1].

Limitations:
1. Per SR-IOV spec section 3.3.12, PFs are required to support
4-KB, 8-KB, 64-KB, 256-KB, 1-MB, and 4-MB page sizes.
Unfourtently the kernel currently initializes the System Page Size register once
and assumes it doesn't change therefore we cannot allow guests to change this
register at will. We currently map both the Supported Page sizes and
System Page Size as virtualized and read only in violation of the spec.
In practice this is not an issue since both the hypervisor and the
guest typically select the same System Page Size.

[1] https://github.com/knuto/qemu/tree/sriov_patches_v6

Ilya Lesokhin (4):
  VFIO: Probe new devices in a live VFIO group with the VFIO driver
  IOMMU: Force the VFs of an untrusted PF device to be in the PFs IOMMU
    group
  PCI: Expose iov_set_numvfs and iov_resource_size for modules.
  VFIO: Add support for SR-IOV extended capablity

 drivers/iommu/iommu.c               |   4 +
 drivers/pci/iov.c                   |   4 +-
 drivers/vfio/pci/vfio_pci.c         |   4 +
 drivers/vfio/pci/vfio_pci_config.c  | 208 +++++++++++++++++++++++++++++++++---
 drivers/vfio/pci/vfio_pci_private.h |   1 +
 drivers/vfio/vfio.c                 |  18 +++-
 include/linux/pci.h                 |   7 ++
 7 files changed, 224 insertions(+), 22 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/4] VFIO: Probe new devices in a live VFIO group with the VFIO driver
  2016-06-09 12:09 [PATCH 0/4] VFIO SR-IOV support Ilya Lesokhin
@ 2016-06-09 12:09 ` Ilya Lesokhin
  2016-06-09 22:21   ` Alex Williamson
  2016-06-09 12:09 ` [PATCH 2/4] IOMMU: Force the VFs of an untrusted PF device to be in the PFs IOMMU group Ilya Lesokhin
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Ilya Lesokhin @ 2016-06-09 12:09 UTC (permalink / raw)
  To: kvm, linux-pci
  Cc: bhelgaas, alex.williamson, noaos, haggaie, ogerlitz, liranl, ilyal

When a new device is added to a live VFIO group,
use driver_override to force the device to be probed by
the VFIO driver.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
---
 drivers/vfio/vfio.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 6fd6fa5..32bc1b4 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -646,6 +646,7 @@ static int vfio_dev_viable(struct device *dev, void *data)
 static int vfio_group_nb_add_dev(struct vfio_group *group, struct device *dev)
 {
 	struct vfio_device *device;
+	struct pci_dev *pdev;
 
 	/* Do we already know about it?  We shouldn't */
 	device = vfio_group_get_device(group, dev);
@@ -658,10 +659,19 @@ static int vfio_group_nb_add_dev(struct vfio_group *group, struct device *dev)
 	if (!atomic_read(&group->container_users))
 		return 0;
 
-	/* TODO Prevent device auto probing */
-	WARN(1, "Device %s added to live group %d!\n", dev_name(dev),
-	     iommu_group_id(group->iommu_group));
-
+	if (dev_is_pci(dev)) {
+		/* a device that shares a group with another VFIO
+		 * device should also be probed by VFIO
+		 */
+		pdev = to_pci_dev(dev);
+		if (pdev->driver_override)
+			kfree(pdev->driver_override);
+		pdev->driver_override = kstrdup("vfio-pci", GFP_KERNEL);
+	} else {
+		/* TODO Prevent device auto probing */
+		WARN(1, "Device %s added to live group %d!\n", dev_name(dev),
+		     iommu_group_id(group->iommu_group));
+	}
 	return 0;
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/4] IOMMU: Force the VFs of an untrusted PF device to be in the PFs IOMMU group
  2016-06-09 12:09 [PATCH 0/4] VFIO SR-IOV support Ilya Lesokhin
  2016-06-09 12:09 ` [PATCH 1/4] VFIO: Probe new devices in a live VFIO group with the VFIO driver Ilya Lesokhin
@ 2016-06-09 12:09 ` Ilya Lesokhin
  2016-06-09 12:56   ` kbuild test robot
                     ` (2 more replies)
  2016-06-09 12:09 ` [PATCH 3/4] PCI: Expose iov_set_numvfs and iov_resource_size for modules Ilya Lesokhin
                   ` (2 subsequent siblings)
  4 siblings, 3 replies; 16+ messages in thread
From: Ilya Lesokhin @ 2016-06-09 12:09 UTC (permalink / raw)
  To: kvm, linux-pci
  Cc: bhelgaas, alex.williamson, noaos, haggaie, ogerlitz, liranl, ilyal

Add a new PCI_DEV_FLAGS_UNTRUSTED to indicate that a PCI device
is probed by a driver that gives untrusted entities access to that device.
Make iommu_group_get_for_pci_dev check the new flag when an IOMMU
group is selected for a virtual function.
Mark VFIO devices with the new flag.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
---
 drivers/iommu/iommu.c       | 4 ++++
 drivers/vfio/pci/vfio_pci.c | 3 +++
 include/linux/pci.h         | 3 +++
 3 files changed, 10 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3000051..9bb914c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -749,6 +749,10 @@ struct iommu_group *pci_device_group(struct device *dev)
 	struct pci_bus *bus;
 	struct iommu_group *group = NULL;
 	u64 devfns[4] = { 0 };
+	
+	if (pdev->is_virtfn && 
+	   (pdev->physfn->dev_flags & PCI_DEV_FLAGS_UNTRUSTED))
+		return iommu_group_get(&pdev->physfn->dev);
 
 	if (WARN_ON(!dev_is_pci(dev)))
 		return ERR_PTR(-EINVAL);
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 188b1ff..72d048e 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -1180,6 +1180,8 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		pci_set_power_state(pdev, PCI_D3hot);
 	}
 
+	pdev->dev_flags |= PCI_DEV_FLAGS_UNTRUSTED;
+
 	return ret;
 }
 
@@ -1187,6 +1189,7 @@ static void vfio_pci_remove(struct pci_dev *pdev)
 {
 	struct vfio_pci_device *vdev;
 
+	pdev->dev_flags &= ~PCI_DEV_FLAGS_UNTRUSTED;
 	vdev = vfio_del_group_dev(&pdev->dev);
 	if (!vdev)
 		return;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index b67e4df..bef9115 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -174,6 +174,9 @@ enum pci_dev_flags {
 	PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
 	/* Get VPD from function 0 VPD */
 	PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
+	/* Untrusted software controls this device
+	 * The VFs of this device should be put in the device's IOMMUs group*/
+	PCI_DEV_FLAGS_UNTRUSTED = (__force pci_dev_flags_t) (1 << 9),
 };
 
 enum pci_irq_reroute_variant {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/4] PCI: Expose iov_set_numvfs and iov_resource_size for modules.
  2016-06-09 12:09 [PATCH 0/4] VFIO SR-IOV support Ilya Lesokhin
  2016-06-09 12:09 ` [PATCH 1/4] VFIO: Probe new devices in a live VFIO group with the VFIO driver Ilya Lesokhin
  2016-06-09 12:09 ` [PATCH 2/4] IOMMU: Force the VFs of an untrusted PF device to be in the PFs IOMMU group Ilya Lesokhin
@ 2016-06-09 12:09 ` Ilya Lesokhin
  2016-06-09 12:27   ` kbuild test robot
  2016-06-09 12:09 ` [PATCH 4/4] VFIO: Add support for SR-IOV extended capablity Ilya Lesokhin
  2016-06-09 12:09 ` [PATCH 4/4] VFIO: Add support for SRIOV " Ilya Lesokhin
  4 siblings, 1 reply; 16+ messages in thread
From: Ilya Lesokhin @ 2016-06-09 12:09 UTC (permalink / raw)
  To: kvm, linux-pci
  Cc: bhelgaas, alex.williamson, noaos, haggaie, ogerlitz, liranl, ilyal

Expose iov_set_numvfs and iov_resource_size to make them available
for VFIO-PCI sriov support.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
---
 drivers/pci/iov.c   | 4 +++-
 include/linux/pci.h | 4 ++++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 2194b44..78b0f3f 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -41,7 +41,7 @@ int pci_iov_virtfn_devfn(struct pci_dev *dev, int vf_id)
  *
  * Update iov->offset and iov->stride when NumVFs is written.
  */
-static inline void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn)
+void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn)
 {
 	struct pci_sriov *iov = dev->sriov;
 
@@ -49,6 +49,7 @@ static inline void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn)
 	pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_OFFSET, &iov->offset);
 	pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_STRIDE, &iov->stride);
 }
+EXPORT_SYMBOL(pci_iov_set_numvfs);
 
 /*
  * The PF consumes one bus number.  NumVFs, First VF Offset, and VF Stride
@@ -112,6 +113,7 @@ resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
 
 	return dev->sriov->barsz[resno - PCI_IOV_RESOURCES];
 }
+EXPORT_SYMBOL(pci_iov_resource_size);
 
 int pci_iov_add_virtfn(struct pci_dev *dev, int id, int reset)
 {
diff --git a/include/linux/pci.h b/include/linux/pci.h
index bef9115..f4fd4b8 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1751,6 +1751,8 @@ int pci_vfs_assigned(struct pci_dev *dev);
 int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
 int pci_sriov_get_totalvfs(struct pci_dev *dev);
 resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno);
+
+void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn);
 #else
 static inline int pci_iov_virtfn_bus(struct pci_dev *dev, int id)
 {
@@ -1778,6 +1780,8 @@ static inline int pci_sriov_get_totalvfs(struct pci_dev *dev)
 { return 0; }
 static inline resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
 { return 0; }
+
+void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn) { }
 #endif
 
 #if defined(CONFIG_HOTPLUG_PCI) || defined(CONFIG_HOTPLUG_PCI_MODULE)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/4] VFIO: Add support for SR-IOV extended capablity
  2016-06-09 12:09 [PATCH 0/4] VFIO SR-IOV support Ilya Lesokhin
                   ` (2 preceding siblings ...)
  2016-06-09 12:09 ` [PATCH 3/4] PCI: Expose iov_set_numvfs and iov_resource_size for modules Ilya Lesokhin
@ 2016-06-09 12:09 ` Ilya Lesokhin
  2016-06-09 12:09 ` [PATCH 4/4] VFIO: Add support for SRIOV " Ilya Lesokhin
  4 siblings, 0 replies; 16+ messages in thread
From: Ilya Lesokhin @ 2016-06-09 12:09 UTC (permalink / raw)
  To: kvm, linux-pci
  Cc: bhelgaas, alex.williamson, noaos, haggaie, ogerlitz, liranl, ilyal

Add support for PCIE SR-IOV extended capablity with following features:
1. The ability to probe SR-IOV BAR sizes.
2. The ability to enable and disable SR-IOV.

Since pci_enable_sriov and pci_disable_sriov are not thread-safe
a new mutex was added to vfio_pci_device to protect those function.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
---
 drivers/vfio/pci/vfio_pci.c         |   1 +
 drivers/vfio/pci/vfio_pci_config.c  | 208 +++++++++++++++++++++++++++++++++---
 drivers/vfio/pci/vfio_pci_private.h |   1 +
 3 files changed, 193 insertions(+), 17 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 72d048e..4fb2a93 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -1151,6 +1151,7 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	vdev->pdev = pdev;
 	vdev->irq_type = VFIO_PCI_NUM_IRQS;
 	mutex_init(&vdev->igate);
+	mutex_init(&vdev->sriov_mutex);
 	spin_lock_init(&vdev->irqlock);
 
 	ret = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, vdev);
diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index 688691d..cea1503 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -448,6 +448,35 @@ static __le32 vfio_generate_bar_flags(struct pci_dev *pdev, int bar)
 	return cpu_to_le32(val);
 }
 
+static void vfio_sriov_bar_fixup(struct vfio_pci_device *vdev,
+				 int sriov_cap_start)
+{
+	struct pci_dev *pdev = vdev->pdev;
+	int i;
+	__le32 *bar;
+	u64 mask;
+
+	bar = (__le32 *)&vdev->vconfig[sriov_cap_start + PCI_SRIOV_BAR];
+
+	for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++, bar++) {
+		if (!pci_resource_start(pdev, i)) {
+			*bar = 0; /* Unmapped by host = unimplemented to user */
+			continue;
+		}
+
+		mask = ~(pci_iov_resource_size(pdev, i) - 1);
+
+		*bar &= cpu_to_le32((u32)mask);
+		*bar |= vfio_generate_bar_flags(pdev, i);
+
+		if (*bar & cpu_to_le32(PCI_BASE_ADDRESS_MEM_TYPE_64)) {
+			bar++;
+			*bar &= cpu_to_le32((u32)(mask >> 32));
+			i++;
+		}
+	}
+}
+
 /*
  * Pretend we're hardware and tweak the values of the *virtual* PCI BARs
  * to reflect the hardware capabilities.  This implements BAR sizing.
@@ -901,6 +930,163 @@ static int __init init_pci_ext_cap_pwr_perm(struct perm_bits *perm)
 	return 0;
 }
 
+static int __init init_pci_ext_cap_sriov_perm(struct perm_bits *perm)
+{
+	int i;
+
+	if (alloc_perm_bits(perm, pci_ext_cap_length[PCI_EXT_CAP_ID_SRIOV]))
+		return -ENOMEM;
+
+	/*
+	 * Virtualize the first dword of all express capabilities
+	 * because it includes the next pointer.  This lets us later
+	 * remove capabilities from the chain if we need to.
+	 */
+	p_setd(perm, 0, ALL_VIRT, NO_WRITE);
+
+	/* VF Enable - Virtualized and writable
+	 * Memory Space Enable - Non-virtualized and writable
+	 */
+	p_setw(perm, PCI_SRIOV_CTRL, PCI_SRIOV_CTRL_VFE,
+	       PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
+
+	p_setw(perm, PCI_SRIOV_NUM_VF, (u16)ALL_VIRT, (u16)ALL_WRITE);
+	p_setw(perm, PCI_SRIOV_SUP_PGSIZE, (u16)ALL_VIRT, 0);
+
+	/* We cannot let user space application change the page size
+	 * so we mark it as read only and trust the user application
+	 * (e.g. qemu) to virtualize this correctly for the guest
+	 */
+	p_setw(perm, PCI_SRIOV_SYS_PGSIZE, (u16)ALL_VIRT, 0);
+
+	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
+		p_setd(perm, PCI_SRIOV_BAR + 4 * i, ALL_VIRT, ALL_WRITE);
+
+	return 0;
+}
+
+static int vfio_find_cap_start(struct vfio_pci_device *vdev, int pos)
+{
+	u8 cap;
+	int base = (pos >= PCI_CFG_SPACE_SIZE) ? PCI_CFG_SPACE_SIZE :
+						 PCI_STD_HEADER_SIZEOF;
+	cap = vdev->pci_config_map[pos];
+
+	if (cap == PCI_CAP_ID_BASIC)
+		return 0;
+
+	/* XXX Can we have to abutting capabilities of the same type? */
+	while (pos - 1 >= base && vdev->pci_config_map[pos - 1] == cap)
+		pos--;
+
+	return pos;
+}
+
+static int vfio_sriov_cap_config_read(struct vfio_pci_device *vdev, int pos,
+				      int count, struct perm_bits *perm,
+				       int offset, __le32 *val)
+{
+	int cap_start = vfio_find_cap_start(vdev, pos);
+
+	vfio_sriov_bar_fixup(vdev, cap_start);
+	return vfio_default_config_read(vdev, pos, count, perm, offset, val);
+}
+
+struct disable_sriov_data {
+	struct work_struct work;
+	struct vfio_pci_device *vdev;
+};
+
+static void vfio_disable_sriov_work(struct work_struct *work)
+{
+	struct disable_sriov_data *data =
+		container_of(work, struct disable_sriov_data, work);
+
+	mutex_lock(&data->vdev->sriov_mutex);
+	pci_disable_sriov(data->vdev->pdev);
+	mutex_unlock(&data->vdev->sriov_mutex);
+}
+
+/* pci_disable_sriov may block waiting for all the VF drivers to unload.
+ * As a result, If a process was allowed to call pci_disable_sriov()
+ * directly it could become unkillable by calling pci_disable_sriov()
+ * while holding a reference to one of the VFs.
+ * To address this issue we to the call to pci_disable_sriov
+ * on a kernel thread the can't hold references on the VFs.
+ */
+static inline int vfio_disable_sriov(struct vfio_pci_device *vdev)
+{
+	struct disable_sriov_data *data = kmalloc(sizeof(*data), GFP_USER);
+
+	if (!data)
+		return -ENOMEM;
+
+	INIT_WORK(&data->work, vfio_disable_sriov_work);
+	data->vdev = vdev;
+
+	schedule_work(&data->work);
+	return 0;
+}
+
+static int vfio_sriov_cap_config_write(struct vfio_pci_device *vdev, int pos,
+				       int count, struct perm_bits *perm,
+				       int offset, __le32 val)
+{
+	int ret;
+	int cap_start = vfio_find_cap_start(vdev, pos);
+	u16 sriov_ctrl = *(u16 *)(vdev->vconfig + cap_start + PCI_SRIOV_CTRL);
+	bool cur_vf_enabled = sriov_ctrl & PCI_SRIOV_CTRL_VFE;
+	bool vf_enabled;
+
+	switch (offset) {
+	case  PCI_SRIOV_NUM_VF:
+	/* Per SR-IOV spec sec 3.3.10 and 3.3.11, First VF Offset
+	 * and VF Stride may change when NumVFs changes.
+	 *
+	 * Therefore we should pass valid writes to the hardware.
+	 *
+	 * Per SR-IOV spec sec 3.3.7
+	 * The results are undefined if NumVFs is set to a value greater
+	 * than TotalVFs.
+	 * NumVFs may only be written while VF Enable is Clear.
+	 * If NumVFs is written when VF Enable is Set, the results
+	 * are undefined.
+
+	 * Avoid passing such writes to the Hardware just in case.
+	 */
+		if (cur_vf_enabled ||
+		    val > pci_sriov_get_totalvfs(vdev->pdev))
+			return count;
+
+		pci_iov_set_numvfs(vdev->pdev, val);
+		break;
+
+	case PCI_SRIOV_CTRL:
+		vf_enabled = val & PCI_SRIOV_CTRL_VFE;
+		ret = 0;
+
+		if (!cur_vf_enabled && vf_enabled) {
+			u16 num_vfs = *(u16 *)(vdev->vconfig +
+					cap_start +
+					PCI_SRIOV_NUM_VF);
+			mutex_lock(&vdev->sriov_mutex);
+			ret = pci_enable_sriov(vdev->pdev, num_vfs);
+			mutex_unlock(&vdev->sriov_mutex);
+		} else if (cur_vf_enabled && !vf_enabled) {
+			ret = vfio_disable_sriov(vdev);
+		}
+		if (ret)
+			return ret;
+		break;
+
+	default:
+		break;
+	}
+
+	return vfio_default_config_write(vdev, pos, count, perm,
+					 offset, val);
+}
+
 /*
  * Initialize the shared permission tables
  */
@@ -916,6 +1102,7 @@ void vfio_pci_uninit_perm_bits(void)
 
 	free_perm_bits(&ecap_perms[PCI_EXT_CAP_ID_ERR]);
 	free_perm_bits(&ecap_perms[PCI_EXT_CAP_ID_PWR]);
+	free_perm_bits(&ecap_perms[PCI_EXT_CAP_ID_SRIOV]);
 }
 
 int __init vfio_pci_init_perm_bits(void)
@@ -938,29 +1125,16 @@ int __init vfio_pci_init_perm_bits(void)
 	ret |= init_pci_ext_cap_pwr_perm(&ecap_perms[PCI_EXT_CAP_ID_PWR]);
 	ecap_perms[PCI_EXT_CAP_ID_VNDR].writefn = vfio_raw_config_write;
 
+	ret |= init_pci_ext_cap_sriov_perm(&ecap_perms[PCI_EXT_CAP_ID_SRIOV]);
+	ecap_perms[PCI_EXT_CAP_ID_SRIOV].readfn = vfio_sriov_cap_config_read;
+	ecap_perms[PCI_EXT_CAP_ID_SRIOV].writefn = vfio_sriov_cap_config_write;
+
 	if (ret)
 		vfio_pci_uninit_perm_bits();
 
 	return ret;
 }
 
-static int vfio_find_cap_start(struct vfio_pci_device *vdev, int pos)
-{
-	u8 cap;
-	int base = (pos >= PCI_CFG_SPACE_SIZE) ? PCI_CFG_SPACE_SIZE :
-						 PCI_STD_HEADER_SIZEOF;
-	cap = vdev->pci_config_map[pos];
-
-	if (cap == PCI_CAP_ID_BASIC)
-		return 0;
-
-	/* XXX Can we have to abutting capabilities of the same type? */
-	while (pos - 1 >= base && vdev->pci_config_map[pos - 1] == cap)
-		pos--;
-
-	return pos;
-}
-
 static int vfio_msi_config_read(struct vfio_pci_device *vdev, int pos,
 				int count, struct perm_bits *perm,
 				int offset, __le32 *val)
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index 016c14a..3be9a61 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -88,6 +88,7 @@ struct vfio_pci_device {
 	int			refcnt;
 	struct eventfd_ctx	*err_trigger;
 	struct eventfd_ctx	*req_trigger;
+	struct mutex		sriov_mutex;
 };
 
 #define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/4] VFIO: Add support for SRIOV extended capablity
  2016-06-09 12:09 [PATCH 0/4] VFIO SR-IOV support Ilya Lesokhin
                   ` (3 preceding siblings ...)
  2016-06-09 12:09 ` [PATCH 4/4] VFIO: Add support for SR-IOV extended capablity Ilya Lesokhin
@ 2016-06-09 12:09 ` Ilya Lesokhin
  4 siblings, 0 replies; 16+ messages in thread
From: Ilya Lesokhin @ 2016-06-09 12:09 UTC (permalink / raw)
  To: kvm, linux-pci
  Cc: bhelgaas, alex.williamson, noaos, haggaie, ogerlitz, liranl, ilyal

Add support for PCIE SRIOV extended capablity with following features:
1. The ability to probe SRIOV BAR sizes.
2. The ability to enable and disable sriov.

Since pci_enable_sriov and pci_disable_sriov are not thread-safe
a new mutex was added to vfio_pci_device to protect those function.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
---
 drivers/vfio/pci/vfio_pci.c         |   1 +
 drivers/vfio/pci/vfio_pci_config.c  | 208 +++++++++++++++++++++++++++++++++---
 drivers/vfio/pci/vfio_pci_private.h |   1 +
 3 files changed, 193 insertions(+), 17 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 72d048e..4fb2a93 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -1151,6 +1151,7 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	vdev->pdev = pdev;
 	vdev->irq_type = VFIO_PCI_NUM_IRQS;
 	mutex_init(&vdev->igate);
+	mutex_init(&vdev->sriov_mutex);
 	spin_lock_init(&vdev->irqlock);
 
 	ret = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, vdev);
diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index 688691d..cea1503 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -448,6 +448,35 @@ static __le32 vfio_generate_bar_flags(struct pci_dev *pdev, int bar)
 	return cpu_to_le32(val);
 }
 
+static void vfio_sriov_bar_fixup(struct vfio_pci_device *vdev,
+				 int sriov_cap_start)
+{
+	struct pci_dev *pdev = vdev->pdev;
+	int i;
+	__le32 *bar;
+	u64 mask;
+
+	bar = (__le32 *)&vdev->vconfig[sriov_cap_start + PCI_SRIOV_BAR];
+
+	for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++, bar++) {
+		if (!pci_resource_start(pdev, i)) {
+			*bar = 0; /* Unmapped by host = unimplemented to user */
+			continue;
+		}
+
+		mask = ~(pci_iov_resource_size(pdev, i) - 1);
+
+		*bar &= cpu_to_le32((u32)mask);
+		*bar |= vfio_generate_bar_flags(pdev, i);
+
+		if (*bar & cpu_to_le32(PCI_BASE_ADDRESS_MEM_TYPE_64)) {
+			bar++;
+			*bar &= cpu_to_le32((u32)(mask >> 32));
+			i++;
+		}
+	}
+}
+
 /*
  * Pretend we're hardware and tweak the values of the *virtual* PCI BARs
  * to reflect the hardware capabilities.  This implements BAR sizing.
@@ -901,6 +930,163 @@ static int __init init_pci_ext_cap_pwr_perm(struct perm_bits *perm)
 	return 0;
 }
 
+static int __init init_pci_ext_cap_sriov_perm(struct perm_bits *perm)
+{
+	int i;
+
+	if (alloc_perm_bits(perm, pci_ext_cap_length[PCI_EXT_CAP_ID_SRIOV]))
+		return -ENOMEM;
+
+	/*
+	 * Virtualize the first dword of all express capabilities
+	 * because it includes the next pointer.  This lets us later
+	 * remove capabilities from the chain if we need to.
+	 */
+	p_setd(perm, 0, ALL_VIRT, NO_WRITE);
+
+	/* VF Enable - Virtualized and writable
+	 * Memory Space Enable - Non-virtualized and writable
+	 */
+	p_setw(perm, PCI_SRIOV_CTRL, PCI_SRIOV_CTRL_VFE,
+	       PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
+
+	p_setw(perm, PCI_SRIOV_NUM_VF, (u16)ALL_VIRT, (u16)ALL_WRITE);
+	p_setw(perm, PCI_SRIOV_SUP_PGSIZE, (u16)ALL_VIRT, 0);
+
+	/* We cannot let user space application change the page size
+	 * so we mark it as read only and trust the user application
+	 * (e.g. qemu) to virtualize this correctly for the guest
+	 */
+	p_setw(perm, PCI_SRIOV_SYS_PGSIZE, (u16)ALL_VIRT, 0);
+
+	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
+		p_setd(perm, PCI_SRIOV_BAR + 4 * i, ALL_VIRT, ALL_WRITE);
+
+	return 0;
+}
+
+static int vfio_find_cap_start(struct vfio_pci_device *vdev, int pos)
+{
+	u8 cap;
+	int base = (pos >= PCI_CFG_SPACE_SIZE) ? PCI_CFG_SPACE_SIZE :
+						 PCI_STD_HEADER_SIZEOF;
+	cap = vdev->pci_config_map[pos];
+
+	if (cap == PCI_CAP_ID_BASIC)
+		return 0;
+
+	/* XXX Can we have to abutting capabilities of the same type? */
+	while (pos - 1 >= base && vdev->pci_config_map[pos - 1] == cap)
+		pos--;
+
+	return pos;
+}
+
+static int vfio_sriov_cap_config_read(struct vfio_pci_device *vdev, int pos,
+				      int count, struct perm_bits *perm,
+				       int offset, __le32 *val)
+{
+	int cap_start = vfio_find_cap_start(vdev, pos);
+
+	vfio_sriov_bar_fixup(vdev, cap_start);
+	return vfio_default_config_read(vdev, pos, count, perm, offset, val);
+}
+
+struct disable_sriov_data {
+	struct work_struct work;
+	struct vfio_pci_device *vdev;
+};
+
+static void vfio_disable_sriov_work(struct work_struct *work)
+{
+	struct disable_sriov_data *data =
+		container_of(work, struct disable_sriov_data, work);
+
+	mutex_lock(&data->vdev->sriov_mutex);
+	pci_disable_sriov(data->vdev->pdev);
+	mutex_unlock(&data->vdev->sriov_mutex);
+}
+
+/* pci_disable_sriov may block waiting for all the VF drivers to unload.
+ * As a result, If a process was allowed to call pci_disable_sriov()
+ * directly it could become unkillable by calling pci_disable_sriov()
+ * while holding a reference to one of the VFs.
+ * To address this issue we to the call to pci_disable_sriov
+ * on a kernel thread the can't hold references on the VFs.
+ */
+static inline int vfio_disable_sriov(struct vfio_pci_device *vdev)
+{
+	struct disable_sriov_data *data = kmalloc(sizeof(*data), GFP_USER);
+
+	if (!data)
+		return -ENOMEM;
+
+	INIT_WORK(&data->work, vfio_disable_sriov_work);
+	data->vdev = vdev;
+
+	schedule_work(&data->work);
+	return 0;
+}
+
+static int vfio_sriov_cap_config_write(struct vfio_pci_device *vdev, int pos,
+				       int count, struct perm_bits *perm,
+				       int offset, __le32 val)
+{
+	int ret;
+	int cap_start = vfio_find_cap_start(vdev, pos);
+	u16 sriov_ctrl = *(u16 *)(vdev->vconfig + cap_start + PCI_SRIOV_CTRL);
+	bool cur_vf_enabled = sriov_ctrl & PCI_SRIOV_CTRL_VFE;
+	bool vf_enabled;
+
+	switch (offset) {
+	case  PCI_SRIOV_NUM_VF:
+	/* Per SR-IOV spec sec 3.3.10 and 3.3.11, First VF Offset
+	 * and VF Stride may change when NumVFs changes.
+	 *
+	 * Therefore we should pass valid writes to the hardware.
+	 *
+	 * Per SR-IOV spec sec 3.3.7
+	 * The results are undefined if NumVFs is set to a value greater
+	 * than TotalVFs.
+	 * NumVFs may only be written while VF Enable is Clear.
+	 * If NumVFs is written when VF Enable is Set, the results
+	 * are undefined.
+
+	 * Avoid passing such writes to the Hardware just in case.
+	 */
+		if (cur_vf_enabled ||
+		    val > pci_sriov_get_totalvfs(vdev->pdev))
+			return count;
+
+		pci_iov_set_numvfs(vdev->pdev, val);
+		break;
+
+	case PCI_SRIOV_CTRL:
+		vf_enabled = val & PCI_SRIOV_CTRL_VFE;
+		ret = 0;
+
+		if (!cur_vf_enabled && vf_enabled) {
+			u16 num_vfs = *(u16 *)(vdev->vconfig +
+					cap_start +
+					PCI_SRIOV_NUM_VF);
+			mutex_lock(&vdev->sriov_mutex);
+			ret = pci_enable_sriov(vdev->pdev, num_vfs);
+			mutex_unlock(&vdev->sriov_mutex);
+		} else if (cur_vf_enabled && !vf_enabled) {
+			ret = vfio_disable_sriov(vdev);
+		}
+		if (ret)
+			return ret;
+		break;
+
+	default:
+		break;
+	}
+
+	return vfio_default_config_write(vdev, pos, count, perm,
+					 offset, val);
+}
+
 /*
  * Initialize the shared permission tables
  */
@@ -916,6 +1102,7 @@ void vfio_pci_uninit_perm_bits(void)
 
 	free_perm_bits(&ecap_perms[PCI_EXT_CAP_ID_ERR]);
 	free_perm_bits(&ecap_perms[PCI_EXT_CAP_ID_PWR]);
+	free_perm_bits(&ecap_perms[PCI_EXT_CAP_ID_SRIOV]);
 }
 
 int __init vfio_pci_init_perm_bits(void)
@@ -938,29 +1125,16 @@ int __init vfio_pci_init_perm_bits(void)
 	ret |= init_pci_ext_cap_pwr_perm(&ecap_perms[PCI_EXT_CAP_ID_PWR]);
 	ecap_perms[PCI_EXT_CAP_ID_VNDR].writefn = vfio_raw_config_write;
 
+	ret |= init_pci_ext_cap_sriov_perm(&ecap_perms[PCI_EXT_CAP_ID_SRIOV]);
+	ecap_perms[PCI_EXT_CAP_ID_SRIOV].readfn = vfio_sriov_cap_config_read;
+	ecap_perms[PCI_EXT_CAP_ID_SRIOV].writefn = vfio_sriov_cap_config_write;
+
 	if (ret)
 		vfio_pci_uninit_perm_bits();
 
 	return ret;
 }
 
-static int vfio_find_cap_start(struct vfio_pci_device *vdev, int pos)
-{
-	u8 cap;
-	int base = (pos >= PCI_CFG_SPACE_SIZE) ? PCI_CFG_SPACE_SIZE :
-						 PCI_STD_HEADER_SIZEOF;
-	cap = vdev->pci_config_map[pos];
-
-	if (cap == PCI_CAP_ID_BASIC)
-		return 0;
-
-	/* XXX Can we have to abutting capabilities of the same type? */
-	while (pos - 1 >= base && vdev->pci_config_map[pos - 1] == cap)
-		pos--;
-
-	return pos;
-}
-
 static int vfio_msi_config_read(struct vfio_pci_device *vdev, int pos,
 				int count, struct perm_bits *perm,
 				int offset, __le32 *val)
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index 016c14a..3be9a61 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -88,6 +88,7 @@ struct vfio_pci_device {
 	int			refcnt;
 	struct eventfd_ctx	*err_trigger;
 	struct eventfd_ctx	*req_trigger;
+	struct mutex		sriov_mutex;
 };
 
 #define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] PCI: Expose iov_set_numvfs and iov_resource_size for modules.
  2016-06-09 12:09 ` [PATCH 3/4] PCI: Expose iov_set_numvfs and iov_resource_size for modules Ilya Lesokhin
@ 2016-06-09 12:27   ` kbuild test robot
  0 siblings, 0 replies; 16+ messages in thread
From: kbuild test robot @ 2016-06-09 12:27 UTC (permalink / raw)
  To: Ilya Lesokhin
  Cc: kbuild-all, kvm, linux-pci, bhelgaas, alex.williamson, noaos,
	haggaie, ogerlitz, liranl, ilyal

[-- Attachment #1: Type: text/plain, Size: 951 bytes --]

Hi,

[auto build test ERROR on vfio/next]
[also build test ERROR on v4.7-rc2 next-20160609]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Ilya-Lesokhin/VFIO-SR-IOV-support/20160609-202117
base:   https://github.com/awilliam/linux-vfio.git next
config: i386-randconfig-i0-06080914 (attached as .config)
compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   arch/x86/kernel/apic/htirq.o: In function `pci_iov_set_numvfs':
>> htirq.c:(.text+0x440): multiple definition of `pci_iov_set_numvfs'
   arch/x86/kernel/apic/io_apic.o:(.text+0x2180): first defined here

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 24886 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] IOMMU: Force the VFs of an untrusted PF device to be in the PFs IOMMU group
  2016-06-09 12:09 ` [PATCH 2/4] IOMMU: Force the VFs of an untrusted PF device to be in the PFs IOMMU group Ilya Lesokhin
@ 2016-06-09 12:56   ` kbuild test robot
  2016-06-09 15:21   ` kbuild test robot
  2016-06-09 22:21   ` Alex Williamson
  2 siblings, 0 replies; 16+ messages in thread
From: kbuild test robot @ 2016-06-09 12:56 UTC (permalink / raw)
  To: Ilya Lesokhin
  Cc: kbuild-all, kvm, linux-pci, bhelgaas, alex.williamson, noaos,
	haggaie, ogerlitz, liranl, ilyal

[-- Attachment #1: Type: text/plain, Size: 1684 bytes --]

Hi,

[auto build test ERROR on vfio/next]
[also build test ERROR on v4.7-rc2 next-20160609]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Ilya-Lesokhin/VFIO-SR-IOV-support/20160609-202117
base:   https://github.com/awilliam/linux-vfio.git next
config: s390-default_defconfig (attached as .config)
compiler: s390x-linux-gnu-gcc (Debian 5.3.1-8) 5.3.1 20160205
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=s390 

All errors (new ones prefixed by >>):

   drivers/iommu/iommu.c: In function 'pci_device_group':
>> drivers/iommu/iommu.c:758:10: error: 'struct pci_dev' has no member named 'physfn'
        (pdev->physfn->dev_flags & PCI_DEV_FLAGS_UNTRUSTED))
             ^
   drivers/iommu/iommu.c:759:31: error: 'struct pci_dev' has no member named 'physfn'
      return iommu_group_get(&pdev->physfn->dev);
                                  ^

vim +758 drivers/iommu/iommu.c

   752		struct group_for_pci_data data;
   753		struct pci_bus *bus;
   754		struct iommu_group *group = NULL;
   755		u64 devfns[4] = { 0 };
   756		
   757		if (pdev->is_virtfn && 
 > 758		   (pdev->physfn->dev_flags & PCI_DEV_FLAGS_UNTRUSTED))
   759			return iommu_group_get(&pdev->physfn->dev);
   760	
   761		if (WARN_ON(!dev_is_pci(dev)))

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 16057 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] IOMMU: Force the VFs of an untrusted PF device to be in the PFs IOMMU group
  2016-06-09 12:09 ` [PATCH 2/4] IOMMU: Force the VFs of an untrusted PF device to be in the PFs IOMMU group Ilya Lesokhin
  2016-06-09 12:56   ` kbuild test robot
@ 2016-06-09 15:21   ` kbuild test robot
  2016-06-09 22:21   ` Alex Williamson
  2 siblings, 0 replies; 16+ messages in thread
From: kbuild test robot @ 2016-06-09 15:21 UTC (permalink / raw)
  To: Ilya Lesokhin
  Cc: kbuild-all, kvm, linux-pci, bhelgaas, alex.williamson, noaos,
	haggaie, ogerlitz, liranl, ilyal

[-- Attachment #1: Type: text/plain, Size: 1577 bytes --]

Hi,

[auto build test ERROR on vfio/next]
[also build test ERROR on v4.7-rc2 next-20160609]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Ilya-Lesokhin/VFIO-SR-IOV-support/20160609-202117
base:   https://github.com/awilliam/linux-vfio.git next
config: x86_64-randconfig-s4-06092146 (attached as .config)
compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/iommu/iommu.c: In function 'pci_device_group':
>> drivers/iommu/iommu.c:758:10: error: 'struct pci_dev' has no member named 'physfn'; did you mean 'is_physfn'?
        (pdev->physfn->dev_flags & PCI_DEV_FLAGS_UNTRUSTED))
             ^~
   drivers/iommu/iommu.c:759:31: error: 'struct pci_dev' has no member named 'physfn'; did you mean 'is_physfn'?
      return iommu_group_get(&pdev->physfn->dev);
                                  ^~

vim +758 drivers/iommu/iommu.c

   752		struct group_for_pci_data data;
   753		struct pci_bus *bus;
   754		struct iommu_group *group = NULL;
   755		u64 devfns[4] = { 0 };
   756		
   757		if (pdev->is_virtfn && 
 > 758		   (pdev->physfn->dev_flags & PCI_DEV_FLAGS_UNTRUSTED))
   759			return iommu_group_get(&pdev->physfn->dev);
   760	
   761		if (WARN_ON(!dev_is_pci(dev)))

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 33858 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] IOMMU: Force the VFs of an untrusted PF device to be in the PFs IOMMU group
  2016-06-09 12:09 ` [PATCH 2/4] IOMMU: Force the VFs of an untrusted PF device to be in the PFs IOMMU group Ilya Lesokhin
  2016-06-09 12:56   ` kbuild test robot
  2016-06-09 15:21   ` kbuild test robot
@ 2016-06-09 22:21   ` Alex Williamson
  2016-06-13  7:08     ` Ilya Lesokhin
  2 siblings, 1 reply; 16+ messages in thread
From: Alex Williamson @ 2016-06-09 22:21 UTC (permalink / raw)
  To: Ilya Lesokhin; +Cc: kvm, linux-pci, bhelgaas, noaos, haggaie, ogerlitz, liranl

On Thu,  9 Jun 2016 15:09:30 +0300
Ilya Lesokhin <ilyal@mellanox.com> wrote:

> Add a new PCI_DEV_FLAGS_UNTRUSTED to indicate that a PCI device
> is probed by a driver that gives untrusted entities access to that device.
> Make iommu_group_get_for_pci_dev check the new flag when an IOMMU
> group is selected for a virtual function.
> Mark VFIO devices with the new flag.
> 
> Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
> ---
>  drivers/iommu/iommu.c       | 4 ++++
>  drivers/vfio/pci/vfio_pci.c | 3 +++
>  include/linux/pci.h         | 3 +++
>  3 files changed, 10 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 3000051..9bb914c 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -749,6 +749,10 @@ struct iommu_group *pci_device_group(struct device *dev)
>  	struct pci_bus *bus;
>  	struct iommu_group *group = NULL;
>  	u64 devfns[4] = { 0 };
> +	
> +	if (pdev->is_virtfn && 
> +	   (pdev->physfn->dev_flags & PCI_DEV_FLAGS_UNTRUSTED))
> +		return iommu_group_get(&pdev->physfn->dev);

This deserves a comment in the code as well as the commit log, but more
importantly the side effect of this is that the user can't make use of
separate IOMMU domains for the PF vs the VF.  I think this ends up
making the entire solution incompatible with things like vIOMMU since
we really need to be able to create separate address spaces per device
to make that work.  What's the point of enabling SR-IOV from userspace
if we can't do things like assign VFs to nested guests or userspace in
the guest?  This is an incomplete feature with that restriction.
Thanks,

Alex

>  
>  	if (WARN_ON(!dev_is_pci(dev)))
>  		return ERR_PTR(-EINVAL);
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 188b1ff..72d048e 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -1180,6 +1180,8 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  		pci_set_power_state(pdev, PCI_D3hot);
>  	}
>  
> +	pdev->dev_flags |= PCI_DEV_FLAGS_UNTRUSTED;
> +
>  	return ret;
>  }
>  
> @@ -1187,6 +1189,7 @@ static void vfio_pci_remove(struct pci_dev *pdev)
>  {
>  	struct vfio_pci_device *vdev;
>  
> +	pdev->dev_flags &= ~PCI_DEV_FLAGS_UNTRUSTED;
>  	vdev = vfio_del_group_dev(&pdev->dev);
>  	if (!vdev)
>  		return;
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index b67e4df..bef9115 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -174,6 +174,9 @@ enum pci_dev_flags {
>  	PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
>  	/* Get VPD from function 0 VPD */
>  	PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
> +	/* Untrusted software controls this device
> +	 * The VFs of this device should be put in the device's IOMMUs group*/
> +	PCI_DEV_FLAGS_UNTRUSTED = (__force pci_dev_flags_t) (1 << 9),
>  };
>  
>  enum pci_irq_reroute_variant {


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] VFIO: Probe new devices in a live VFIO group with the VFIO driver
  2016-06-09 12:09 ` [PATCH 1/4] VFIO: Probe new devices in a live VFIO group with the VFIO driver Ilya Lesokhin
@ 2016-06-09 22:21   ` Alex Williamson
  2016-06-13  6:33     ` Ilya Lesokhin
  0 siblings, 1 reply; 16+ messages in thread
From: Alex Williamson @ 2016-06-09 22:21 UTC (permalink / raw)
  To: Ilya Lesokhin; +Cc: kvm, linux-pci, bhelgaas, noaos, haggaie, ogerlitz, liranl

On Thu,  9 Jun 2016 15:09:29 +0300
Ilya Lesokhin <ilyal@mellanox.com> wrote:

> When a new device is added to a live VFIO group,
> use driver_override to force the device to be probed by
> the VFIO driver.
> 
> Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
> ---
>  drivers/vfio/vfio.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 6fd6fa5..32bc1b4 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -646,6 +646,7 @@ static int vfio_dev_viable(struct device *dev, void *data)
>  static int vfio_group_nb_add_dev(struct vfio_group *group, struct device *dev)
>  {
>  	struct vfio_device *device;
> +	struct pci_dev *pdev;
>  
>  	/* Do we already know about it?  We shouldn't */
>  	device = vfio_group_get_device(group, dev);
> @@ -658,10 +659,19 @@ static int vfio_group_nb_add_dev(struct vfio_group *group, struct device *dev)
>  	if (!atomic_read(&group->container_users))
>  		return 0;
>  
> -	/* TODO Prevent device auto probing */
> -	WARN(1, "Device %s added to live group %d!\n", dev_name(dev),
> -	     iommu_group_id(group->iommu_group));
> -
> +	if (dev_is_pci(dev)) {

declare pdev here.

> +		/* a device that shares a group with another VFIO
> +		 * device should also be probed by VFIO
> +		 */

Use existing comment style:

/*
 * comment
 */

> +		pdev = to_pci_dev(dev);
> +		if (pdev->driver_override)
> +			kfree(pdev->driver_override);
> +		pdev->driver_override = kstrdup("vfio-pci", GFP_KERNEL);

kstrdup can fail.  Probably want to issue a dev_info message about
capturing the device as well.  Thanks,

Alex

> +	} else {
> +		/* TODO Prevent device auto probing */
> +		WARN(1, "Device %s added to live group %d!\n", dev_name(dev),
> +		     iommu_group_id(group->iommu_group));
> +	}
>  	return 0;
>  }
>  


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH 1/4] VFIO: Probe new devices in a live VFIO group with the VFIO driver
  2016-06-09 22:21   ` Alex Williamson
@ 2016-06-13  6:33     ` Ilya Lesokhin
  2016-06-13 16:02       ` Alex Williamson
  0 siblings, 1 reply; 16+ messages in thread
From: Ilya Lesokhin @ 2016-06-13  6:33 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, linux-pci, bhelgaas, Noa Osherovich, Haggai Eran,
	Or Gerlitz, Liran Liss

> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Friday, June 10, 2016 1:22 AM
> To: Ilya Lesokhin <ilyal@mellanox.com>
> Cc: kvm@vger.kernel.org; linux-pci@vger.kernel.org; bhelgaas@google.com;
> Noa Osherovich <noaos@mellanox.com>; Haggai Eran
> <haggaie@mellanox.com>; Or Gerlitz <ogerlitz@mellanox.com>; Liran Liss
> <liranl@mellanox.com>
> Subject: Re: [PATCH 1/4] VFIO: Probe new devices in a live VFIO group with
> the VFIO driver
...
> > +		pdev = to_pci_dev(dev);
> > +		if (pdev->driver_override)
> > +			kfree(pdev->driver_override);
> > +		pdev->driver_override = kstrdup("vfio-pci", GFP_KERNEL);
> 
> kstrdup can fail.  Probably want to issue a about capturing
> the device as well.  Thanks,
> 

Any recommendation as to what I should do in case kstrdup fails?
Fallback to the existing warning ("device added to a live group") ? 

Thanks,
Ilya.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH 2/4] IOMMU: Force the VFs of an untrusted PF device to be in the PFs IOMMU group
  2016-06-09 22:21   ` Alex Williamson
@ 2016-06-13  7:08     ` Ilya Lesokhin
  2016-06-13 16:00       ` Alex Williamson
  0 siblings, 1 reply; 16+ messages in thread
From: Ilya Lesokhin @ 2016-06-13  7:08 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, linux-pci, bhelgaas, Noa Osherovich, Haggai Eran,
	Or Gerlitz, Liran Liss

> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Friday, June 10, 2016 1:21 AM
> To: Ilya Lesokhin <ilyal@mellanox.com>
> Cc: kvm@vger.kernel.org; linux-pci@vger.kernel.org; bhelgaas@google.com;
> Noa Osherovich <noaos@mellanox.com>; Haggai Eran
> <haggaie@mellanox.com>; Or Gerlitz <ogerlitz@mellanox.com>; Liran Liss
> <liranl@mellanox.com>
> Subject: Re: [PATCH 2/4] IOMMU: Force the VFs of an untrusted PF device to
> be in the PFs IOMMU group
...
> This deserves a comment in the code as well as the commit log, but more
> importantly the side effect of this is that the user can't make use of separate
> IOMMU domains for the PF vs the VF.  I think this ends up making the entire
> solution incompatible with things like vIOMMU since we really need to be
> able to create separate address spaces per device to make that work.  What's
> the point of enabling SR-IOV from userspace if we can't do things like assign
> VFs to nested guests or userspace in the guest?  This is an incomplete
> feature with that restriction.

I agree that this is a problem and I will mention this limitation in the commit log.
However to address this properly you need nested IOMMU group which don't really exist.
This feature is still useful, at least for us, as you can enable sriov in a guest and work with
probed VFs in the same guest.
Furthermore, if you have a real nested IOMMU you should be able to do nested device 
assignment even though the PF and VF's belong to the same group.

I think we should push this feature with the limitation and hopefully
it will be addressed in the future, agree?

Thanks,
Ilya

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] IOMMU: Force the VFs of an untrusted PF device to be in the PFs IOMMU group
  2016-06-13  7:08     ` Ilya Lesokhin
@ 2016-06-13 16:00       ` Alex Williamson
  2016-06-16 12:46         ` Ilya Lesokhin
  0 siblings, 1 reply; 16+ messages in thread
From: Alex Williamson @ 2016-06-13 16:00 UTC (permalink / raw)
  To: Ilya Lesokhin
  Cc: kvm, linux-pci, bhelgaas, Noa Osherovich, Haggai Eran,
	Or Gerlitz, Liran Liss

On Mon, 13 Jun 2016 07:08:03 +0000
Ilya Lesokhin <ilyal@mellanox.com> wrote:

> > -----Original Message-----
> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > Sent: Friday, June 10, 2016 1:21 AM
> > To: Ilya Lesokhin <ilyal@mellanox.com>
> > Cc: kvm@vger.kernel.org; linux-pci@vger.kernel.org; bhelgaas@google.com;
> > Noa Osherovich <noaos@mellanox.com>; Haggai Eran
> > <haggaie@mellanox.com>; Or Gerlitz <ogerlitz@mellanox.com>; Liran Liss
> > <liranl@mellanox.com>
> > Subject: Re: [PATCH 2/4] IOMMU: Force the VFs of an untrusted PF device to
> > be in the PFs IOMMU group  
> ...
> > This deserves a comment in the code as well as the commit log, but more
> > importantly the side effect of this is that the user can't make use of separate
> > IOMMU domains for the PF vs the VF.  I think this ends up making the entire
> > solution incompatible with things like vIOMMU since we really need to be
> > able to create separate address spaces per device to make that work.  What's
> > the point of enabling SR-IOV from userspace if we can't do things like assign
> > VFs to nested guests or userspace in the guest?  This is an incomplete
> > feature with that restriction.  
> 
> I agree that this is a problem and I will mention this limitation in the commit log.
> However to address this properly you need nested IOMMU group which don't really exist.
> This feature is still useful, at least for us, as you can enable sriov in a guest and work with
> probed VFs in the same guest.
> Furthermore, if you have a real nested IOMMU you should be able to do nested device 
> assignment even though the PF and VF's belong to the same group.
> 
> I think we should push this feature with the limitation and hopefully
> it will be addressed in the future, agree?

Sorry, I don't agree, nor do I think that nested IOMMU groups are the
solution to the problem (or even really understand what nest IOMMU
groups are).  It seems we have an issue that an untrusted user is
creating devices and we're trying to overload the concept of an IOMMU
group to handle that.  An IOMMU group is meant to describe the DMA
isolation of a set of devices, so it really has no business being
overloaded to enforce ownership like this, nor can we assume that we
can support multiple IOMMU contexts within a group regardless of a "real
nested IOMMU".  We already see coming a very serious usage restriction
that a user cannot create independent IOMMU contexts as a direct result
of this hack, which severely limits future usefulness.  I don't know
what the solution is here, but I'm pretty sure this is not it.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] VFIO: Probe new devices in a live VFIO group with the VFIO driver
  2016-06-13  6:33     ` Ilya Lesokhin
@ 2016-06-13 16:02       ` Alex Williamson
  0 siblings, 0 replies; 16+ messages in thread
From: Alex Williamson @ 2016-06-13 16:02 UTC (permalink / raw)
  To: Ilya Lesokhin
  Cc: kvm, linux-pci, bhelgaas, Noa Osherovich, Haggai Eran,
	Or Gerlitz, Liran Liss

On Mon, 13 Jun 2016 06:33:53 +0000
Ilya Lesokhin <ilyal@mellanox.com> wrote:

> > -----Original Message-----
> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > Sent: Friday, June 10, 2016 1:22 AM
> > To: Ilya Lesokhin <ilyal@mellanox.com>
> > Cc: kvm@vger.kernel.org; linux-pci@vger.kernel.org; bhelgaas@google.com;
> > Noa Osherovich <noaos@mellanox.com>; Haggai Eran
> > <haggaie@mellanox.com>; Or Gerlitz <ogerlitz@mellanox.com>; Liran Liss
> > <liranl@mellanox.com>
> > Subject: Re: [PATCH 1/4] VFIO: Probe new devices in a live VFIO group with
> > the VFIO driver  
> ...
> > > +		pdev = to_pci_dev(dev);
> > > +		if (pdev->driver_override)
> > > +			kfree(pdev->driver_override);
> > > +		pdev->driver_override = kstrdup("vfio-pci", GFP_KERNEL);  
> > 
> > kstrdup can fail.  Probably want to issue a about capturing
> > the device as well.  Thanks,
> >   
> 
> Any recommendation as to what I should do in case kstrdup fails?
> Fallback to the existing warning ("device added to a live group") ? 

Probably so.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH 2/4] IOMMU: Force the VFs of an untrusted PF device to be in the PFs IOMMU group
  2016-06-13 16:00       ` Alex Williamson
@ 2016-06-16 12:46         ` Ilya Lesokhin
  0 siblings, 0 replies; 16+ messages in thread
From: Ilya Lesokhin @ 2016-06-16 12:46 UTC (permalink / raw)
  To: Alex Williamson, bhelgaas
  Cc: kvm, linux-pci, Noa Osherovich, Haggai Eran, Or Gerlitz, Liran Liss

You've convinced us that we should use separate groups.
But If I do that I can't think of a clean way to force the VFs to be probed by VFIO.

The ideas I have are:
	1. Add a driver_override parameter to pci_enable_sriov, sriov_enable  and virtfn_add.
	2. Register to the bus notifier of the PF's bus before calling pci_enable_sriov and then override the
	Driver in BUS_NOTIFY_ADD_DEVICE event like I did in the IOMMU group notifier when the group was shared.

Do you have a better idea? or a recommendation as to which of my ideas I should use?

Thank,
Ilya
	
> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Monday, June 13, 2016 7:01 PM
> To: Ilya Lesokhin <ilyal@mellanox.com>
> Cc: kvm@vger.kernel.org; linux-pci@vger.kernel.org; bhelgaas@google.com;
> Noa Osherovich <noaos@mellanox.com>; Haggai Eran
> <haggaie@mellanox.com>; Or Gerlitz <ogerlitz@mellanox.com>; Liran Liss
> <liranl@mellanox.com>
> Subject: Re: [PATCH 2/4] IOMMU: Force the VFs of an untrusted PF device to
> be in the PFs IOMMU group
> 
> On Mon, 13 Jun 2016 07:08:03 +0000
> Ilya Lesokhin <ilyal@mellanox.com> wrote:
> 
> > > -----Original Message-----
> > > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > > Sent: Friday, June 10, 2016 1:21 AM
> > > To: Ilya Lesokhin <ilyal@mellanox.com>
> > > Cc: kvm@vger.kernel.org; linux-pci@vger.kernel.org;
> > > bhelgaas@google.com; Noa Osherovich <noaos@mellanox.com>; Haggai
> > > Eran <haggaie@mellanox.com>; Or Gerlitz <ogerlitz@mellanox.com>;
> > > Liran Liss <liranl@mellanox.com>
> > > Subject: Re: [PATCH 2/4] IOMMU: Force the VFs of an untrusted PF
> > > device to be in the PFs IOMMU group
> > ...
> > > This deserves a comment in the code as well as the commit log, but
> > > more importantly the side effect of this is that the user can't make
> > > use of separate IOMMU domains for the PF vs the VF.  I think this
> > > ends up making the entire solution incompatible with things like
> > > vIOMMU since we really need to be able to create separate address
> > > spaces per device to make that work.  What's the point of enabling
> > > SR-IOV from userspace if we can't do things like assign VFs to
> > > nested guests or userspace in the guest?  This is an incomplete feature
> with that restriction.
> >
> > I agree that this is a problem and I will mention this limitation in the commit
> log.
> > However to address this properly you need nested IOMMU group which
> don't really exist.
> > This feature is still useful, at least for us, as you can enable sriov
> > in a guest and work with probed VFs in the same guest.
> > Furthermore, if you have a real nested IOMMU you should be able to do
> > nested device assignment even though the PF and VF's belong to the same
> group.
> >
> > I think we should push this feature with the limitation and hopefully
> > it will be addressed in the future, agree?
> 
> Sorry, I don't agree, nor do I think that nested IOMMU groups are the
> solution to the problem (or even really understand what nest IOMMU
> groups are).  It seems we have an issue that an untrusted user is creating
> devices and we're trying to overload the concept of an IOMMU group to
> handle that.  An IOMMU group is meant to describe the DMA isolation of a
> set of devices, so it really has no business being overloaded to enforce
> ownership like this, nor can we assume that we can support multiple IOMMU
> contexts within a group regardless of a "real nested IOMMU".  We already
> see coming a very serious usage restriction that a user cannot create
> independent IOMMU contexts as a direct result of this hack, which severely
> limits future usefulness.  I don't know what the solution is here, but I'm
> pretty sure this is not it.  Thanks,
> 
> Alex

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-06-16 12:46 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-09 12:09 [PATCH 0/4] VFIO SR-IOV support Ilya Lesokhin
2016-06-09 12:09 ` [PATCH 1/4] VFIO: Probe new devices in a live VFIO group with the VFIO driver Ilya Lesokhin
2016-06-09 22:21   ` Alex Williamson
2016-06-13  6:33     ` Ilya Lesokhin
2016-06-13 16:02       ` Alex Williamson
2016-06-09 12:09 ` [PATCH 2/4] IOMMU: Force the VFs of an untrusted PF device to be in the PFs IOMMU group Ilya Lesokhin
2016-06-09 12:56   ` kbuild test robot
2016-06-09 15:21   ` kbuild test robot
2016-06-09 22:21   ` Alex Williamson
2016-06-13  7:08     ` Ilya Lesokhin
2016-06-13 16:00       ` Alex Williamson
2016-06-16 12:46         ` Ilya Lesokhin
2016-06-09 12:09 ` [PATCH 3/4] PCI: Expose iov_set_numvfs and iov_resource_size for modules Ilya Lesokhin
2016-06-09 12:27   ` kbuild test robot
2016-06-09 12:09 ` [PATCH 4/4] VFIO: Add support for SR-IOV extended capablity Ilya Lesokhin
2016-06-09 12:09 ` [PATCH 4/4] VFIO: Add support for SRIOV " Ilya Lesokhin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.