All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC V2 0/4] VFIO SRIOV support
@ 2016-02-04  8:28 Ilya Lesokhin
  2016-02-04  8:28 ` [RFC V2 1/4] VFIO: Probe new devices in a live VFIO group with the VFIO driver Ilya Lesokhin
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Ilya Lesokhin @ 2016-02-04  8:28 UTC (permalink / raw)
  To: kvm, linux-pci
  Cc: bhelgaas, alex.williamson, noaos, haggaie, ogerlitz, liranl, ilyal

Changes from v1:
	Due to the security concern raised in v1, we add two patches
	to make sure the VFs belong to the same IOMMU group as
	the PF and are probed by VFIO.

Today the QEMU hypervisor allows assigning a physical device to a VM,
facilitating driver development. However, it does not support enabling
SR-IOV by the VM kernel driver. Our goal is to implement such support,
allowing developers working on SR-IOV physical function drivers to work
inside VMs as well.

This patch series implements the kernel side of our solution.  It extends
the VFIO driver to support the PCIE SRIOV extended capability with
following features:
1. The ability to probe SRIOV BAR sizes.
2. The ability to enable and disable sriov.

This patch series is going to be used by QEMU to expose sriov capabilities
to VM. We already have an early prototype based on Knut Omang's patches for
SRIOV[1]. 

Open issues:
1. How to tell if it is safe to disable SRIOV?
In the current implementation, a userspace can enable sriov, grab one of
the VFs and then call disable sriov without releasing the device.  This
will result in a deadlock where the user process is stuck inside disable
sriov waiting for itself to release the device. Killing the process leaves
it in a zombie state.
We also get a strange warning saying:
[  181.668492] WARNING: CPU: 22 PID: 3684 at kernel/sched/core.c:7497 __might_sleep+0x77/0x80() 
[  181.668502] do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810aa193>] prepare_to_wait_event+0x63/0xf0

2. How to expose the Supported Page Sizes and System Page Size registers in
the SRIOV capability? 
Presently the hypervisor initializes Supported Page Sizes once and assumes
it doesn't change therefore we cannot allow user space to change this
register at will. The first solution that comes to mind is to expose a
device that only supports the page size selected by the hypervisor.
Unfourtently, Per SR-IOV spec section 3.3.12, PFs are required to support
4-KB, 8-KB, 64-KB, 256-KB, 1-MB, and 4-MB page sizes. We currently map both
registers as virtualized and read only and leave user space to worry about
this problem.

3. Other SRIOV capabilities.
Do we want to hide capabilities we do not support in the SR-IOV
Capabilities register? or leave it to the userspace application?

[1] https://github.com/knuto/qemu/tree/sriov_patches_v6

Ilya Lesokhin (4):
  VFIO: Probe new devices in a live VFIO group with the VFIO driver
  IOMMU: Force the VFs of an untrusted PF device to be in the PFs IOMMU
    group
  PCI: Expose iov_set_numvfs and iov_resource_size for modules.
  VFIO: Add support for SRIOV extended capablity

 drivers/iommu/iommu.c              |   4 +
 drivers/pci/iov.c                  |   4 +-
 drivers/vfio/pci/vfio_pci.c        |   3 +
 drivers/vfio/pci/vfio_pci_config.c | 169 +++++++++++++++++++++++++++++++++----
 drivers/vfio/vfio.c                |  18 +++-
 include/linux/pci.h                |   5 ++
 6 files changed, 182 insertions(+), 21 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC V2 1/4] VFIO: Probe new devices in a live VFIO group with the VFIO driver
  2016-02-04  8:28 [RFC V2 0/4] VFIO SRIOV support Ilya Lesokhin
@ 2016-02-04  8:28 ` Ilya Lesokhin
  2016-02-04  8:28 ` [RFC V2 2/4] IOMMU: Force the VFs of an untrusted PF device to be in the PFs IOMMU group Ilya Lesokhin
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Ilya Lesokhin @ 2016-02-04  8:28 UTC (permalink / raw)
  To: kvm, linux-pci
  Cc: bhelgaas, alex.williamson, noaos, haggaie, ogerlitz, liranl, ilyal

When a new device is added to a live VFIO group,
use driver_override to force the device to be probed by
the VFIO driver.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
---
 drivers/vfio/vfio.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 563c510..176647e 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -504,6 +504,7 @@ static int vfio_dev_viable(struct device *dev, void *data)
 static int vfio_group_nb_add_dev(struct vfio_group *group, struct device *dev)
 {
 	struct vfio_device *device;
+	struct pci_dev *pdev;
 
 	/* Do we already know about it?  We shouldn't */
 	device = vfio_group_get_device(group, dev);
@@ -516,9 +517,20 @@ static int vfio_group_nb_add_dev(struct vfio_group *group, struct device *dev)
 	if (!atomic_read(&group->container_users))
 		return 0;
 
-	/* TODO Prevent device auto probing */
-	WARN("Device %s added to live group %d!\n", dev_name(dev),
-	     iommu_group_id(group->iommu_group));
+	if (dev_is_pci(dev)) {
+		/* a device that shares a group with another VFIO
+		 * device should also be probed by VFIO
+		 */
+		pdev = to_pci_dev(dev);
+		if (pdev->driver_override)
+			kfree(pdev->driver_override);
+
+		pdev->driver_override = kstrdup("vfio-pci", GFP_KERNEL);
+	} else {
+		/* TODO Prevent device auto probing */
+		WARN("Device %s added to live group %d!\n", dev_name(dev),
+		     iommu_group_id(group->iommu_group));
+	}
 
 	return 0;
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC V2 2/4] IOMMU: Force the VFs of an untrusted PF device to be in the PFs IOMMU group
  2016-02-04  8:28 [RFC V2 0/4] VFIO SRIOV support Ilya Lesokhin
  2016-02-04  8:28 ` [RFC V2 1/4] VFIO: Probe new devices in a live VFIO group with the VFIO driver Ilya Lesokhin
@ 2016-02-04  8:28 ` Ilya Lesokhin
  2016-02-25 15:37   ` Bjorn Helgaas
  2016-02-04  8:28 ` [RFC V2 3/4] PCI: Expose iov_set_numvfs and iov_resource_size for modules Ilya Lesokhin
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 8+ messages in thread
From: Ilya Lesokhin @ 2016-02-04  8:28 UTC (permalink / raw)
  To: kvm, linux-pci
  Cc: bhelgaas, alex.williamson, noaos, haggaie, ogerlitz, liranl, ilyal

Add a new PCI_DEV_FLAGS_UNTRUSTED to indicate that a PCI device
is probed by a driver that gives untrusted entities access to that device.
Make iommu_group_get_for_pci_dev check the new flag when an IOMMU
group is selected for a virtual function.
Mark VFIO devices with the new flag.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
---
 drivers/iommu/iommu.c       | 4 ++++
 drivers/vfio/pci/vfio_pci.c | 3 +++
 include/linux/pci.h         | 1 +
 3 files changed, 8 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 049df49..864b459 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -738,6 +738,10 @@ static struct iommu_group *iommu_group_get_for_pci_dev(struct pci_dev *pdev)
 	struct iommu_group *group = NULL;
 	u64 devfns[4] = { 0 };
 
+	if (pdev->is_virtfn && 
+	   (pdev->physfn->dev_flags & PCI_DEV_FLAGS_UNTRUSTED))
+		return iommu_group_get(&pdev->physfn->dev);
+	
 	/*
 	 * Find the upstream DMA alias for the device.  A device must not
 	 * be aliased due to topology in order to have its own IOMMU group.
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 964ad57..ddcfd2c 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -982,6 +982,8 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		pci_set_power_state(pdev, PCI_D3hot);
 	}
 
+	pdev->dev_flags |= PCI_DEV_FLAGS_UNTRUSTED;
+
 	return ret;
 }
 
@@ -989,6 +991,7 @@ static void vfio_pci_remove(struct pci_dev *pdev)
 {
 	struct vfio_pci_device *vdev;
 
+	pdev->dev_flags &= ~PCI_DEV_FLAGS_UNTRUSTED;
 	vdev = vfio_del_group_dev(&pdev->dev);
 	if (!vdev)
 		return;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index e90eb22..6330327 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -182,6 +182,7 @@ enum pci_dev_flags {
 	PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
 	/* Get VPD from function 0 VPD */
 	PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
+	PCI_DEV_FLAGS_UNTRUSTED = (__force pci_dev_flags_t) (1 << 9),
 };
 
 enum pci_irq_reroute_variant {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC V2 3/4] PCI: Expose iov_set_numvfs and iov_resource_size for modules.
  2016-02-04  8:28 [RFC V2 0/4] VFIO SRIOV support Ilya Lesokhin
  2016-02-04  8:28 ` [RFC V2 1/4] VFIO: Probe new devices in a live VFIO group with the VFIO driver Ilya Lesokhin
  2016-02-04  8:28 ` [RFC V2 2/4] IOMMU: Force the VFs of an untrusted PF device to be in the PFs IOMMU group Ilya Lesokhin
@ 2016-02-04  8:28 ` Ilya Lesokhin
  2016-02-04  8:28 ` [RFC V2 4/4] VFIO: Add support for SRIOV extended capablity Ilya Lesokhin
  2016-02-25 15:35 ` [RFC V2 0/4] VFIO SRIOV support Bjorn Helgaas
  4 siblings, 0 replies; 8+ messages in thread
From: Ilya Lesokhin @ 2016-02-04  8:28 UTC (permalink / raw)
  To: kvm, linux-pci
  Cc: bhelgaas, alex.williamson, noaos, haggaie, ogerlitz, liranl, ilyal

Expose iov_set_numvfs and iov_resource_size to make them available
for VFIO-PCI sriov support.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
---
 drivers/pci/iov.c   | 4 +++-
 include/linux/pci.h | 4 ++++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index ee0ebff..f296bd3 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -41,7 +41,7 @@ int pci_iov_virtfn_devfn(struct pci_dev *dev, int vf_id)
  *
  * Update iov->offset and iov->stride when NumVFs is written.
  */
-static inline void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn)
+inline void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn)
 {
 	struct pci_sriov *iov = dev->sriov;
 
@@ -49,6 +49,7 @@ static inline void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn)
 	pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_OFFSET, &iov->offset);
 	pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_STRIDE, &iov->stride);
 }
+EXPORT_SYMBOL(pci_iov_set_numvfs);
 
 /*
  * The PF consumes one bus number.  NumVFs, First VF Offset, and VF Stride
@@ -107,6 +108,7 @@ resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
 
 	return dev->sriov->barsz[resno - PCI_IOV_RESOURCES];
 }
+EXPORT_SYMBOL(pci_iov_resource_size);
 
 static int virtfn_add(struct pci_dev *dev, int id, int reset)
 {
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 6330327..4dda3fe 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1725,6 +1725,8 @@ int pci_vfs_assigned(struct pci_dev *dev);
 int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
 int pci_sriov_get_totalvfs(struct pci_dev *dev);
 resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno);
+
+void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn);
 #else
 static inline int pci_iov_virtfn_bus(struct pci_dev *dev, int id)
 {
@@ -1746,6 +1748,8 @@ static inline int pci_sriov_get_totalvfs(struct pci_dev *dev)
 { return 0; }
 static inline resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
 { return 0; }
+
+void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn) { }
 #endif
 
 #if defined(CONFIG_HOTPLUG_PCI) || defined(CONFIG_HOTPLUG_PCI_MODULE)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC V2 4/4] VFIO: Add support for SRIOV extended capablity
  2016-02-04  8:28 [RFC V2 0/4] VFIO SRIOV support Ilya Lesokhin
                   ` (2 preceding siblings ...)
  2016-02-04  8:28 ` [RFC V2 3/4] PCI: Expose iov_set_numvfs and iov_resource_size for modules Ilya Lesokhin
@ 2016-02-04  8:28 ` Ilya Lesokhin
  2016-02-25 15:35 ` [RFC V2 0/4] VFIO SRIOV support Bjorn Helgaas
  4 siblings, 0 replies; 8+ messages in thread
From: Ilya Lesokhin @ 2016-02-04  8:28 UTC (permalink / raw)
  To: kvm, linux-pci
  Cc: bhelgaas, alex.williamson, noaos, haggaie, ogerlitz, liranl, ilyal

Add support for PCIE SRIOV extended capablity with following features:
1. The ability to probe SRIOV BAR sizes.
2. The ability to enable and disable sriov.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
---
 drivers/vfio/pci/vfio_pci_config.c | 169 +++++++++++++++++++++++++++++++++----
 1 file changed, 152 insertions(+), 17 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index ff75ca3..04e364f 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -420,6 +420,35 @@ static __le32 vfio_generate_bar_flags(struct pci_dev *pdev, int bar)
 	return cpu_to_le32(val);
 }
 
+static void vfio_sriov_bar_fixup(struct vfio_pci_device *vdev,
+				 int sriov_cap_start)
+{
+	struct pci_dev *pdev = vdev->pdev;
+	int i;
+	__le32 *bar;
+	u64 mask;
+
+	bar = (__le32 *)&vdev->vconfig[sriov_cap_start + PCI_SRIOV_BAR];
+
+	for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++, bar++) {
+		if (!pci_resource_start(pdev, i)) {
+			*bar = 0; /* Unmapped by host = unimplemented to user */
+			continue;
+		}
+
+		mask = ~(pci_iov_resource_size(pdev, i) - 1);
+
+		*bar &= cpu_to_le32((u32)mask);
+		*bar |= vfio_generate_bar_flags(pdev, i);
+
+		if (*bar & cpu_to_le32(PCI_BASE_ADDRESS_MEM_TYPE_64)) {
+			bar++;
+			*bar &= cpu_to_le32((u32)(mask >> 32));
+			i++;
+		}
+	}
+}
+
 /*
  * Pretend we're hardware and tweak the values of the *virtual* PCI BARs
  * to reflect the hardware capabilities.  This implements BAR sizing.
@@ -782,6 +811,124 @@ static int __init init_pci_ext_cap_pwr_perm(struct perm_bits *perm)
 	return 0;
 }
 
+static int __init init_pci_ext_cap_sriov_perm(struct perm_bits *perm)
+{
+	int i;
+
+	if (alloc_perm_bits(perm, pci_ext_cap_length[PCI_EXT_CAP_ID_SRIOV]))
+		return -ENOMEM;
+
+	/*
+	 * Virtualize the first dword of all express capabilities
+	 * because it includes the next pointer.  This lets us later
+	 * remove capabilities from the chain if we need to.
+	 */
+	p_setd(perm, 0, ALL_VIRT, NO_WRITE);
+
+	/* VF Enable - Virtualized and writable
+	 * Memory Space Enable - Non-virtualized and writable
+	 */
+	p_setw(perm, PCI_SRIOV_CTRL, PCI_SRIOV_CTRL_VFE,
+	       PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
+
+	p_setw(perm, PCI_SRIOV_NUM_VF, (u16)ALL_VIRT, (u16)ALL_WRITE);
+	p_setw(perm, PCI_SRIOV_SUP_PGSIZE, (u16)ALL_VIRT, 0);
+
+	/* We cannot let user space application change the page size
+	 * so we mark it as read only and trust the user application
+	 * (e.g. qemu) to virtualize this correctly for the guest
+	 */
+	p_setw(perm, PCI_SRIOV_SYS_PGSIZE, (u16)ALL_VIRT, 0);
+
+	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
+		p_setd(perm, PCI_SRIOV_BAR + 4 * i, ALL_VIRT, ALL_WRITE);
+
+	return 0;
+}
+
+static int vfio_find_cap_start(struct vfio_pci_device *vdev, int pos)
+{
+	u8 cap;
+	int base = (pos >= PCI_CFG_SPACE_SIZE) ? PCI_CFG_SPACE_SIZE :
+						 PCI_STD_HEADER_SIZEOF;
+	cap = vdev->pci_config_map[pos];
+
+	if (cap == PCI_CAP_ID_BASIC)
+		return 0;
+
+	/* XXX Can we have to abutting capabilities of the same type? */
+	while (pos - 1 >= base && vdev->pci_config_map[pos - 1] == cap)
+		pos--;
+
+	return pos;
+}
+
+static int vfio_sriov_cap_config_read(struct vfio_pci_device *vdev, int pos,
+				      int count, struct perm_bits *perm,
+				       int offset, __le32 *val)
+{
+	int cap_start = vfio_find_cap_start(vdev, pos);
+
+	vfio_sriov_bar_fixup(vdev, cap_start);
+	return vfio_default_config_read(vdev, pos, count, perm, offset, val);
+}
+
+static int vfio_sriov_cap_config_write(struct vfio_pci_device *vdev, int pos,
+				       int count, struct perm_bits *perm,
+				       int offset, __le32 val)
+{
+	int ret;
+	int cap_start = vfio_find_cap_start(vdev, pos);
+	u16 sriov_ctrl = *(u16 *)(vdev->vconfig + cap_start + PCI_SRIOV_CTRL);
+	bool cur_vf_enabled = sriov_ctrl & PCI_SRIOV_CTRL_VFE;
+	bool vf_enabled;
+
+	switch (offset) {
+	case  PCI_SRIOV_NUM_VF:
+	/* Per SR-IOV spec sec 3.3.10 and 3.3.11, First VF Offset
+	 * and VF Stride may change when NumVFs changes.
+	 *
+	 * Therefore we should pass valid writes to the hardware.
+	 *
+	 * Per SR-IOV spec sec 3.3.7
+	 * The results are undefined if NumVFs is set to a value greater
+	 * than TotalVFs.
+	 * NumVFs may only be written while VF Enable is Clear.
+	 * If NumVFs is written when VF Enable is Set, the results
+	 * are undefined.
+
+	 * Avoid passing such writes to the Hardware just in case.
+	 */
+		if (cur_vf_enabled ||
+		    val > pci_sriov_get_totalvfs(vdev->pdev))
+			return count;
+
+		pci_iov_set_numvfs(vdev->pdev, val);
+		break;
+
+	case PCI_SRIOV_CTRL:
+		vf_enabled = val & PCI_SRIOV_CTRL_VFE;
+
+		if (!cur_vf_enabled && vf_enabled) {
+			u16 num_vfs = *(u16 *)(vdev->vconfig +
+					cap_start +
+					PCI_SRIOV_NUM_VF);
+			ret = pci_enable_sriov(vdev->pdev, num_vfs);
+			if (ret)
+				return count;
+		} else if (cur_vf_enabled && !vf_enabled) {
+			pci_disable_sriov(vdev->pdev);
+		}
+		break;
+
+	default:
+		break;
+	}
+
+	return vfio_default_config_write(vdev, pos, count, perm,
+					 offset, val);
+}
+
 /*
  * Initialize the shared permission tables
  */
@@ -796,6 +943,7 @@ void vfio_pci_uninit_perm_bits(void)
 
 	free_perm_bits(&ecap_perms[PCI_EXT_CAP_ID_ERR]);
 	free_perm_bits(&ecap_perms[PCI_EXT_CAP_ID_PWR]);
+	free_perm_bits(&ecap_perms[PCI_EXT_CAP_ID_SRIOV]);
 }
 
 int __init vfio_pci_init_perm_bits(void)
@@ -818,29 +966,16 @@ int __init vfio_pci_init_perm_bits(void)
 	ret |= init_pci_ext_cap_pwr_perm(&ecap_perms[PCI_EXT_CAP_ID_PWR]);
 	ecap_perms[PCI_EXT_CAP_ID_VNDR].writefn = vfio_raw_config_write;
 
+	ret |= init_pci_ext_cap_sriov_perm(&ecap_perms[PCI_EXT_CAP_ID_SRIOV]);
+	ecap_perms[PCI_EXT_CAP_ID_SRIOV].readfn = vfio_sriov_cap_config_read;
+	ecap_perms[PCI_EXT_CAP_ID_SRIOV].writefn = vfio_sriov_cap_config_write;
+
 	if (ret)
 		vfio_pci_uninit_perm_bits();
 
 	return ret;
 }
 
-static int vfio_find_cap_start(struct vfio_pci_device *vdev, int pos)
-{
-	u8 cap;
-	int base = (pos >= PCI_CFG_SPACE_SIZE) ? PCI_CFG_SPACE_SIZE :
-						 PCI_STD_HEADER_SIZEOF;
-	cap = vdev->pci_config_map[pos];
-
-	if (cap == PCI_CAP_ID_BASIC)
-		return 0;
-
-	/* XXX Can we have to abutting capabilities of the same type? */
-	while (pos - 1 >= base && vdev->pci_config_map[pos - 1] == cap)
-		pos--;
-
-	return pos;
-}
-
 static int vfio_msi_config_read(struct vfio_pci_device *vdev, int pos,
 				int count, struct perm_bits *perm,
 				int offset, __le32 *val)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC V2 0/4] VFIO SRIOV support
  2016-02-04  8:28 [RFC V2 0/4] VFIO SRIOV support Ilya Lesokhin
                   ` (3 preceding siblings ...)
  2016-02-04  8:28 ` [RFC V2 4/4] VFIO: Add support for SRIOV extended capablity Ilya Lesokhin
@ 2016-02-25 15:35 ` Bjorn Helgaas
  4 siblings, 0 replies; 8+ messages in thread
From: Bjorn Helgaas @ 2016-02-25 15:35 UTC (permalink / raw)
  To: Ilya Lesokhin
  Cc: kvm, linux-pci, bhelgaas, alex.williamson, noaos, haggaie,
	ogerlitz, liranl

Hi Ilya,

On Thu, Feb 04, 2016 at 10:28:53AM +0200, Ilya Lesokhin wrote:
> Changes from v1:
> 	Due to the security concern raised in v1, we add two patches
> 	to make sure the VFs belong to the same IOMMU group as
> 	the PF and are probed by VFIO.
> 
> Today the QEMU hypervisor allows assigning a physical device to a VM,
> facilitating driver development. However, it does not support enabling
> SR-IOV by the VM kernel driver. Our goal is to implement such support,
> allowing developers working on SR-IOV physical function drivers to work
> inside VMs as well.
> 
> This patch series implements the kernel side of our solution.  It extends
> the VFIO driver to support the PCIE SRIOV extended capability with
> following features:
> 1. The ability to probe SRIOV BAR sizes.
> 2. The ability to enable and disable sriov.
> 
> This patch series is going to be used by QEMU to expose sriov capabilities
> to VM. We already have an early prototype based on Knut Omang's patches for
> SRIOV[1]. 
> 
> Open issues:
> 1. How to tell if it is safe to disable SRIOV?
> In the current implementation, a userspace can enable sriov, grab one of
> the VFs and then call disable sriov without releasing the device.  This
> will result in a deadlock where the user process is stuck inside disable
> sriov waiting for itself to release the device. Killing the process leaves
> it in a zombie state.
> We also get a strange warning saying:
> [  181.668492] WARNING: CPU: 22 PID: 3684 at kernel/sched/core.c:7497 __might_sleep+0x77/0x80() 
> [  181.668502] do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810aa193>] prepare_to_wait_event+0x63/0xf0
> 
> 2. How to expose the Supported Page Sizes and System Page Size registers in
> the SRIOV capability? 
> Presently the hypervisor initializes Supported Page Sizes once and assumes
> it doesn't change therefore we cannot allow user space to change this
> register at will. The first solution that comes to mind is to expose a
> device that only supports the page size selected by the hypervisor.
> Unfourtently, Per SR-IOV spec section 3.3.12, PFs are required to support
> 4-KB, 8-KB, 64-KB, 256-KB, 1-MB, and 4-MB page sizes. We currently map both
> registers as virtualized and read only and leave user space to worry about
> this problem.
> 
> 3. Other SRIOV capabilities.
> Do we want to hide capabilities we do not support in the SR-IOV
> Capabilities register? or leave it to the userspace application?
> 
> [1] https://github.com/knuto/qemu/tree/sriov_patches_v6
> 
> Ilya Lesokhin (4):
>   VFIO: Probe new devices in a live VFIO group with the VFIO driver
>   IOMMU: Force the VFs of an untrusted PF device to be in the PFs IOMMU
>     group
>   PCI: Expose iov_set_numvfs and iov_resource_size for modules.
>   VFIO: Add support for SRIOV extended capablity
> 
>  drivers/iommu/iommu.c              |   4 +
>  drivers/pci/iov.c                  |   4 +-
>  drivers/vfio/pci/vfio_pci.c        |   3 +
>  drivers/vfio/pci/vfio_pci_config.c | 169 +++++++++++++++++++++++++++++++++----
>  drivers/vfio/vfio.c                |  18 +++-
>  include/linux/pci.h                |   5 ++
>  6 files changed, 182 insertions(+), 21 deletions(-)

This is mostly a VFIO series, so I'm going to defer to Alex.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC V2 2/4] IOMMU: Force the VFs of an untrusted PF device to be in the PFs IOMMU group
  2016-02-04  8:28 ` [RFC V2 2/4] IOMMU: Force the VFs of an untrusted PF device to be in the PFs IOMMU group Ilya Lesokhin
@ 2016-02-25 15:37   ` Bjorn Helgaas
  2016-02-25 17:54     ` Bjorn Helgaas
  0 siblings, 1 reply; 8+ messages in thread
From: Bjorn Helgaas @ 2016-02-25 15:37 UTC (permalink / raw)
  To: Ilya Lesokhin
  Cc: kvm, linux-pci, bhelgaas, alex.williamson, noaos, haggaie,
	ogerlitz, liranl

On Thu, Feb 04, 2016 at 10:28:55AM +0200, Ilya Lesokhin wrote:
> Add a new PCI_DEV_FLAGS_UNTRUSTED to indicate that a PCI device
> is probed by a driver that gives untrusted entities access to that device.
> Make iommu_group_get_for_pci_dev check the new flag when an IOMMU
> group is selected for a virtual function.
> Mark VFIO devices with the new flag.
> 
> Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
> ---
>  drivers/iommu/iommu.c       | 4 ++++
>  drivers/vfio/pci/vfio_pci.c | 3 +++
>  include/linux/pci.h         | 1 +
>  3 files changed, 8 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 049df49..864b459 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -738,6 +738,10 @@ static struct iommu_group *iommu_group_get_for_pci_dev(struct pci_dev *pdev)
>  	struct iommu_group *group = NULL;
>  	u64 devfns[4] = { 0 };
>  
> +	if (pdev->is_virtfn && 
> +	   (pdev->physfn->dev_flags & PCI_DEV_FLAGS_UNTRUSTED))
> +		return iommu_group_get(&pdev->physfn->dev);
> +	
>  	/*
>  	 * Find the upstream DMA alias for the device.  A device must not
>  	 * be aliased due to topology in order to have its own IOMMU group.
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 964ad57..ddcfd2c 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -982,6 +982,8 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  		pci_set_power_state(pdev, PCI_D3hot);
>  	}
>  
> +	pdev->dev_flags |= PCI_DEV_FLAGS_UNTRUSTED;
> +
>  	return ret;
>  }
>  
> @@ -989,6 +991,7 @@ static void vfio_pci_remove(struct pci_dev *pdev)
>  {
>  	struct vfio_pci_device *vdev;
>  
> +	pdev->dev_flags &= ~PCI_DEV_FLAGS_UNTRUSTED;
>  	vdev = vfio_del_group_dev(&pdev->dev);
>  	if (!vdev)
>  		return;
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index e90eb22..6330327 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -182,6 +182,7 @@ enum pci_dev_flags {
>  	PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
>  	/* Get VPD from function 0 VPD */
>  	PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
> +	PCI_DEV_FLAGS_UNTRUSTED = (__force pci_dev_flags_t) (1 << 9),

I'm raising my eyebrows a bit at this.  PCI_DEV_FLAGS_UNTRUSTED
doesn't seem like a PCI core property, so it seems like the PCI core
is an innocent bystander here (it neither sets nor checks the flag),
and you're asking it to keep track of bookkeeping details for other
unrelated entities.

>  };
>  
>  enum pci_irq_reroute_variant {
> -- 
> 1.8.3.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC V2 2/4] IOMMU: Force the VFs of an untrusted PF device to be in the PFs IOMMU group
  2016-02-25 15:37   ` Bjorn Helgaas
@ 2016-02-25 17:54     ` Bjorn Helgaas
  0 siblings, 0 replies; 8+ messages in thread
From: Bjorn Helgaas @ 2016-02-25 17:54 UTC (permalink / raw)
  To: Ilya Lesokhin
  Cc: kvm, linux-pci, bhelgaas, alex.williamson, noaos, haggaie,
	ogerlitz, liranl

[Hi Ilya, I think your response didn't go to the list because it was
not plain text only (please fix your email client).  I'm inserting
your response manually below.]

Ilya wrote:
> On Thu, Feb 25, 2016 at 09:37:26AM -0600, Bjorn Helgaas wrote:
> > On Thu, Feb 04, 2016 at 10:28:55AM +0200, Ilya Lesokhin wrote:
> > > Add a new PCI_DEV_FLAGS_UNTRUSTED to indicate that a PCI device
> > > is probed by a driver that gives untrusted entities access to that device.
> > > Make iommu_group_get_for_pci_dev check the new flag when an IOMMU
> > > group is selected for a virtual function.
> > > Mark VFIO devices with the new flag.
> > > 
> > > Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
> > > ---
> > >  drivers/iommu/iommu.c       | 4 ++++
> > >  drivers/vfio/pci/vfio_pci.c | 3 +++
> > >  include/linux/pci.h         | 1 +
> > >  3 files changed, 8 insertions(+)
> > > 
> > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > > index 049df49..864b459 100644
> > > --- a/drivers/iommu/iommu.c
> > > +++ b/drivers/iommu/iommu.c
> > > @@ -738,6 +738,10 @@ static struct iommu_group *iommu_group_get_for_pci_dev(struct pci_dev *pdev)
> > >  	struct iommu_group *group = NULL;
> > >  	u64 devfns[4] = { 0 };
> > >  
> > > +	if (pdev->is_virtfn && 
> > > +	   (pdev->physfn->dev_flags & PCI_DEV_FLAGS_UNTRUSTED))
> > > +		return iommu_group_get(&pdev->physfn->dev);
> > > +	
> > >  	/*
> > >  	 * Find the upstream DMA alias for the device.  A device must not
> > >  	 * be aliased due to topology in order to have its own IOMMU group.
> > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > > index 964ad57..ddcfd2c 100644
> > > --- a/drivers/vfio/pci/vfio_pci.c
> > > +++ b/drivers/vfio/pci/vfio_pci.c
> > > @@ -982,6 +982,8 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > >  		pci_set_power_state(pdev, PCI_D3hot);
> > >  	}
> > >  
> > > +	pdev->dev_flags |= PCI_DEV_FLAGS_UNTRUSTED;
> > > +
> > >  	return ret;
> > >  }
> > >  
> > > @@ -989,6 +991,7 @@ static void vfio_pci_remove(struct pci_dev *pdev)
> > >  {
> > >  	struct vfio_pci_device *vdev;
> > >  
> > > +	pdev->dev_flags &= ~PCI_DEV_FLAGS_UNTRUSTED;
> > >  	vdev = vfio_del_group_dev(&pdev->dev);
> > >  	if (!vdev)
> > >  		return;
> > > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > > index e90eb22..6330327 100644
> > > --- a/include/linux/pci.h
> > > +++ b/include/linux/pci.h
> > > @@ -182,6 +182,7 @@ enum pci_dev_flags {
> > >  	PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
> > >  	/* Get VPD from function 0 VPD */
> > >  	PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
> > > +	PCI_DEV_FLAGS_UNTRUSTED = (__force pci_dev_flags_t) (1 << 9),
> > 
> > I'm raising my eyebrows a bit at this.  PCI_DEV_FLAGS_UNTRUSTED
> > doesn't seem like a PCI core property, so it seems like the PCI core
> > is an innocent bystander here (it neither sets nor checks the flag),
> > and you're asking it to keep track of bookkeeping details for other
> > unrelated entities.
> 
> PCI_DEV_FLAGS_UNTRUSTED is quite similar to
> PCI_DEV_FLAGS_DMA_ALIAS_DEVFN,
> 
> they both indicate that a device needs to be put in the same IOMMU
> group as another device.
> 
> In fact, we initially though about overloading the meaning of
> PCI_DEV_FLAGS_DMA_ALIAS_DEVFN
> 
> To force the VFs in the same group as the PF (if its probed by VFIO).

It's true that it's similar to PCI_DEV_FLAGS_DMA_ALIAS_DEVFN.  I don't
really like that either :)

There's a little bit of current discussion about that here: 
http://lkml.kernel.org/r/20160224194406.7585.17447.stgit@bhelgaas-glaptop2.roam.corp.google.com

I don't know if I have a better suggestion yet.  Having a real PCI
interface, even if just simple wrappers that set/test the bit, at
least provides a place for an explanatory comment, so that would be a
little better in my mind.

Bjorn

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-02-25 17:54 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-04  8:28 [RFC V2 0/4] VFIO SRIOV support Ilya Lesokhin
2016-02-04  8:28 ` [RFC V2 1/4] VFIO: Probe new devices in a live VFIO group with the VFIO driver Ilya Lesokhin
2016-02-04  8:28 ` [RFC V2 2/4] IOMMU: Force the VFs of an untrusted PF device to be in the PFs IOMMU group Ilya Lesokhin
2016-02-25 15:37   ` Bjorn Helgaas
2016-02-25 17:54     ` Bjorn Helgaas
2016-02-04  8:28 ` [RFC V2 3/4] PCI: Expose iov_set_numvfs and iov_resource_size for modules Ilya Lesokhin
2016-02-04  8:28 ` [RFC V2 4/4] VFIO: Add support for SRIOV extended capablity Ilya Lesokhin
2016-02-25 15:35 ` [RFC V2 0/4] VFIO SRIOV support Bjorn Helgaas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.