All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] VFIO SRIOV support
@ 2016-06-19 12:16 Ilya Lesokhin
  2016-06-19 12:16 ` [PATCH v2 1/2] PCI: Extend PCI IOV API Ilya Lesokhin
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Ilya Lesokhin @ 2016-06-19 12:16 UTC (permalink / raw)
  To: kvm, linux-pci
  Cc: bhelgaas, alex.williamson, noaos, haggaie, ogerlitz, liranl, ilyal

Changes from V1:
	1. The VF are no longer assigned to PFs iommu group
	2. Add a pci_enable_sriov_with_override API to allow
	enablind sriov without probing the VFs with the
	default driver

Changes from RFC V2:
        1. pci_disable_sriov() is now called from a workqueue
        To avoid the situation where a process is blocked
        in pci_disable_sriov() wating for itself to relase the VFs.
        2. a mutex was added to synchronize calls to
        pci_enable_sriov() and pci_disable_sriov()

Changes from RFC V1:
        Due to the security concern raised in RFC V1, we add two patches
        to make sure the VFs belong to the same IOMMU group as
        the PF and are probed by VFIO.

Today the QEMU hypervisor allows assigning a physical device to a VM,
facilitating driver development. However, it does not support enabling
SR-IOV by the VM kernel driver. Our goal is to implement such support,
allowing developers working on SR-IOV physical function drivers to work
inside VMs as well.

This patch series implements the kernel side of our solution.  It extends
the VFIO driver to support the PCIE SRIOV extended capability with
following features:
1. The ability to probe SR-IOV BAR sizes.
2. The ability to enable and disable SR-IOV.

This patch series is going to be used by QEMU to expose SR-IOV capabilities
to VM. We already have an early prototype based on Knut Omang's patches for
SR-IOV[1].

Limitations:
1. Per SR-IOV spec section 3.3.12, PFs are required to support
4-KB, 8-KB, 64-KB, 256-KB, 1-MB, and 4-MB page sizes.
Unfourtently the kernel currently initializes the System Page Size register once
and assumes it doesn't change therefore we cannot allow guests to change this
register at will. We currently map both the Supported Page sizes and
System Page Size as virtualized and read only in violation of the spec.
In practice this is not an issue since both the hypervisor and the
guest typically select the same System Page Size.

[1] https://github.com/knuto/qemu/tree/sriov_patches_v6

Ilya Lesokhin (2):
  PCI: Extend PCI IOV API
  VFIO: Add support for SR-IOV extended capablity

 drivers/pci/iov.c                   |  41 +++++--
 drivers/vfio/pci/vfio_pci.c         |   1 +
 drivers/vfio/pci/vfio_pci_config.c  | 210 +++++++++++++++++++++++++++++++++---
 drivers/vfio/pci/vfio_pci_private.h |   1 +
 include/linux/pci.h                 |  13 ++-
 5 files changed, 240 insertions(+), 26 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/2] PCI: Extend PCI IOV API
  2016-06-19 12:16 [PATCH v2 0/2] VFIO SRIOV support Ilya Lesokhin
@ 2016-06-19 12:16 ` Ilya Lesokhin
  2016-06-19 14:10   ` kbuild test robot
  2016-06-19 12:16 ` [PATCH v2 2/2] VFIO: Add support for SR-IOV extended capablity Ilya Lesokhin
  2016-06-20 17:37 ` [PATCH v2 0/2] VFIO SRIOV support Alex Williamson
  2 siblings, 1 reply; 20+ messages in thread
From: Ilya Lesokhin @ 2016-06-19 12:16 UTC (permalink / raw)
  To: kvm, linux-pci
  Cc: bhelgaas, alex.williamson, noaos, haggaie, ogerlitz, liranl, ilyal

1. Add pci_enable_sriov_with_override to allow
enabling sriov with a driver override
on the VFs.

2. Expose pci_iov_set_numvfs and pci_iov_resource_size
to make them available for other modules

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
---
 drivers/pci/iov.c   | 41 +++++++++++++++++++++++++++++++++--------
 include/linux/pci.h | 13 ++++++++++++-
 2 files changed, 45 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 2194b44..5ffd926 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -41,7 +41,7 @@ int pci_iov_virtfn_devfn(struct pci_dev *dev, int vf_id)
  *
  * Update iov->offset and iov->stride when NumVFs is written.
  */
-static inline void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn)
+void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn)
 {
 	struct pci_sriov *iov = dev->sriov;
 
@@ -49,6 +49,7 @@ static inline void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn)
 	pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_OFFSET, &iov->offset);
 	pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_STRIDE, &iov->stride);
 }
+EXPORT_SYMBOL(pci_iov_set_numvfs);
 
 /*
  * The PF consumes one bus number.  NumVFs, First VF Offset, and VF Stride
@@ -112,8 +113,10 @@ resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
 
 	return dev->sriov->barsz[resno - PCI_IOV_RESOURCES];
 }
+EXPORT_SYMBOL(pci_iov_resource_size);
 
-int pci_iov_add_virtfn(struct pci_dev *dev, int id, int reset)
+int pci_iov_add_virtfn(struct pci_dev *dev, int id, int reset,
+		       char *driver_override)
 {
 	int i;
 	int rc = -ENOMEM;
@@ -154,14 +157,20 @@ int pci_iov_add_virtfn(struct pci_dev *dev, int id, int reset)
 		rc = request_resource(res, &virtfn->resource[i]);
 		BUG_ON(rc);
 	}
-
 	if (reset)
 		__pci_reset_function(virtfn);
 
 	pci_device_add(virtfn, virtfn->bus);
 	mutex_unlock(&iov->dev->sriov->lock);
 
+	if (driver_override) {
+		virtfn->driver_override = kstrdup(driver_override, GFP_KERNEL);
+		if (!virtfn->driver_override)
+			goto failed1;
+	}
+
 	pci_bus_add_device(virtfn);
+
 	sprintf(buf, "virtfn%u", id);
 	rc = sysfs_create_link(&dev->dev.kobj, &virtfn->dev.kobj, buf);
 	if (rc)
@@ -235,7 +244,8 @@ int __weak pcibios_sriov_disable(struct pci_dev *pdev)
 	return 0;
 }
 
-static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
+static int sriov_enable(struct pci_dev *dev, int nr_virtfn,
+			char *driver_override)
 {
 	int rc;
 	int i;
@@ -321,7 +331,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 	}
 
 	for (i = 0; i < initial; i++) {
-		rc = pci_iov_add_virtfn(dev, i, 0);
+		rc = pci_iov_add_virtfn(dev, i, 0, driver_override);
 		if (rc)
 			goto failed;
 	}
@@ -622,20 +632,35 @@ int pci_iov_bus_range(struct pci_bus *bus)
 }
 
 /**
- * pci_enable_sriov - enable the SR-IOV capability
+ * pci_enable_sriov_with_override - enable the SR-IOV capability
  * @dev: the PCI device
  * @nr_virtfn: number of virtual functions to enable
+ * driver_override: driver override for VFs
  *
  * Returns 0 on success, or negative on failure.
  */
-int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn)
+int pci_enable_sriov_with_override(struct pci_dev *dev, int nr_virtfn,
+				   char *driver_override)
 {
 	might_sleep();
 
 	if (!dev->is_physfn)
 		return -ENOSYS;
 
-	return sriov_enable(dev, nr_virtfn);
+	return sriov_enable(dev, nr_virtfn, driver_override);
+}
+EXPORT_SYMBOL_GPL(pci_enable_sriov_with_override);
+
+/**
+ * pci_enable_sriov - enable the SR-IOV capability
+ * @dev: the PCI device
+ * @nr_virtfn: number of virtual functions to enable
+ *
+ * Returns 0 on success, or negative on failure.
+ */
+int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn)
+{
+	return pci_enable_sriov_with_override(dev, nr_virtfn, NULL);
 }
 EXPORT_SYMBOL_GPL(pci_enable_sriov);
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index b67e4df..54b3059 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1739,15 +1739,20 @@ void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar);
 int pci_iov_virtfn_bus(struct pci_dev *dev, int id);
 int pci_iov_virtfn_devfn(struct pci_dev *dev, int id);
 
+int pci_enable_sriov_with_override(struct pci_dev *dev, int nr_virtfn,
+				   char *driver_override);
 int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
 void pci_disable_sriov(struct pci_dev *dev);
-int pci_iov_add_virtfn(struct pci_dev *dev, int id, int reset);
+int pci_iov_add_virtfn(struct pci_dev *dev, int id, int reset,
+		       char *driver_override);
 void pci_iov_remove_virtfn(struct pci_dev *dev, int id, int reset);
 int pci_num_vf(struct pci_dev *dev);
 int pci_vfs_assigned(struct pci_dev *dev);
 int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
 int pci_sriov_get_totalvfs(struct pci_dev *dev);
 resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno);
+
+void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn);
 #else
 static inline int pci_iov_virtfn_bus(struct pci_dev *dev, int id)
 {
@@ -1757,6 +1762,11 @@ static inline int pci_iov_virtfn_devfn(struct pci_dev *dev, int id)
 {
 	return -ENOSYS;
 }
+
+static inline int pci_enable_sriov_with_override(struct pci_dev *dev,
+						 int nr_virtfn,
+						 char *driver_override)
+{ return -ENODEV; }
 static inline int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn)
 { return -ENODEV; }
 static inline int pci_iov_add_virtfn(struct pci_dev *dev, int id, int reset)
@@ -1775,6 +1785,7 @@ static inline int pci_sriov_get_totalvfs(struct pci_dev *dev)
 { return 0; }
 static inline resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
 { return 0; }
+static inline void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn) { }
 #endif
 
 #if defined(CONFIG_HOTPLUG_PCI) || defined(CONFIG_HOTPLUG_PCI_MODULE)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/2] VFIO: Add support for SR-IOV extended capablity
  2016-06-19 12:16 [PATCH v2 0/2] VFIO SRIOV support Ilya Lesokhin
  2016-06-19 12:16 ` [PATCH v2 1/2] PCI: Extend PCI IOV API Ilya Lesokhin
@ 2016-06-19 12:16 ` Ilya Lesokhin
  2016-06-19 23:07   ` kbuild test robot
  2016-06-20 17:37 ` [PATCH v2 0/2] VFIO SRIOV support Alex Williamson
  2 siblings, 1 reply; 20+ messages in thread
From: Ilya Lesokhin @ 2016-06-19 12:16 UTC (permalink / raw)
  To: kvm, linux-pci
  Cc: bhelgaas, alex.williamson, noaos, haggaie, ogerlitz, liranl, ilyal

Add support for PCIE SR-IOV extended capablity with following features:
1. The ability to probe SR-IOV BAR sizes.
2. The ability to enable and disable SR-IOV.

Since pci_enable_sriov and pci_disable_sriov are not thread-safe
a new mutex was added to vfio_pci_device to protect those function.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
---
 drivers/vfio/pci/vfio_pci.c         |   1 +
 drivers/vfio/pci/vfio_pci_config.c  | 210 +++++++++++++++++++++++++++++++++---
 drivers/vfio/pci/vfio_pci_private.h |   1 +
 3 files changed, 195 insertions(+), 17 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 188b1ff..2b194c7 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -1151,6 +1151,7 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	vdev->pdev = pdev;
 	vdev->irq_type = VFIO_PCI_NUM_IRQS;
 	mutex_init(&vdev->igate);
+	mutex_init(&vdev->sriov_mutex);
 	spin_lock_init(&vdev->irqlock);
 
 	ret = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, vdev);
diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index 688691d..22553d6 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -448,6 +448,35 @@ static __le32 vfio_generate_bar_flags(struct pci_dev *pdev, int bar)
 	return cpu_to_le32(val);
 }
 
+static void vfio_sriov_bar_fixup(struct vfio_pci_device *vdev,
+				 int sriov_cap_start)
+{
+	struct pci_dev *pdev = vdev->pdev;
+	int i;
+	__le32 *bar;
+	u64 mask;
+
+	bar = (__le32 *)&vdev->vconfig[sriov_cap_start + PCI_SRIOV_BAR];
+
+	for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++, bar++) {
+		if (!pci_resource_start(pdev, i)) {
+			*bar = 0; /* Unmapped by host = unimplemented to user */
+			continue;
+		}
+
+		mask = ~(pci_iov_resource_size(pdev, i) - 1);
+
+		*bar &= cpu_to_le32((u32)mask);
+		*bar |= vfio_generate_bar_flags(pdev, i);
+
+		if (*bar & cpu_to_le32(PCI_BASE_ADDRESS_MEM_TYPE_64)) {
+			bar++;
+			*bar &= cpu_to_le32((u32)(mask >> 32));
+			i++;
+		}
+	}
+}
+
 /*
  * Pretend we're hardware and tweak the values of the *virtual* PCI BARs
  * to reflect the hardware capabilities.  This implements BAR sizing.
@@ -901,6 +930,165 @@ static int __init init_pci_ext_cap_pwr_perm(struct perm_bits *perm)
 	return 0;
 }
 
+static int __init init_pci_ext_cap_sriov_perm(struct perm_bits *perm)
+{
+	int i;
+
+	if (alloc_perm_bits(perm, pci_ext_cap_length[PCI_EXT_CAP_ID_SRIOV]))
+		return -ENOMEM;
+
+	/*
+	 * Virtualize the first dword of all express capabilities
+	 * because it includes the next pointer.  This lets us later
+	 * remove capabilities from the chain if we need to.
+	 */
+	p_setd(perm, 0, ALL_VIRT, NO_WRITE);
+
+	/* VF Enable - Virtualized and writable
+	 * Memory Space Enable - Non-virtualized and writable
+	 */
+	p_setw(perm, PCI_SRIOV_CTRL, PCI_SRIOV_CTRL_VFE,
+	       PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
+
+	p_setw(perm, PCI_SRIOV_NUM_VF, (u16)ALL_VIRT, (u16)ALL_WRITE);
+	p_setw(perm, PCI_SRIOV_SUP_PGSIZE, (u16)ALL_VIRT, 0);
+
+	/* We cannot let user space application change the page size
+	 * so we mark it as read only and trust the user application
+	 * (e.g. qemu) to virtualize this correctly for the guest
+	 */
+	p_setw(perm, PCI_SRIOV_SYS_PGSIZE, (u16)ALL_VIRT, 0);
+
+	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
+		p_setd(perm, PCI_SRIOV_BAR + 4 * i, ALL_VIRT, ALL_WRITE);
+
+	return 0;
+}
+
+static int vfio_find_cap_start(struct vfio_pci_device *vdev, int pos)
+{
+	u8 cap;
+	int base = (pos >= PCI_CFG_SPACE_SIZE) ? PCI_CFG_SPACE_SIZE :
+						 PCI_STD_HEADER_SIZEOF;
+	cap = vdev->pci_config_map[pos];
+
+	if (cap == PCI_CAP_ID_BASIC)
+		return 0;
+
+	/* XXX Can we have to abutting capabilities of the same type? */
+	while (pos - 1 >= base && vdev->pci_config_map[pos - 1] == cap)
+		pos--;
+
+	return pos;
+}
+
+static int vfio_sriov_cap_config_read(struct vfio_pci_device *vdev, int pos,
+				      int count, struct perm_bits *perm,
+				       int offset, __le32 *val)
+{
+	int cap_start = vfio_find_cap_start(vdev, pos);
+
+	vfio_sriov_bar_fixup(vdev, cap_start);
+	return vfio_default_config_read(vdev, pos, count, perm, offset, val);
+}
+
+struct disable_sriov_data {
+	struct work_struct work;
+	struct vfio_pci_device *vdev;
+};
+
+static void vfio_disable_sriov_work(struct work_struct *work)
+{
+	struct disable_sriov_data *data =
+		container_of(work, struct disable_sriov_data, work);
+
+	mutex_lock(&data->vdev->sriov_mutex);
+	pci_disable_sriov(data->vdev->pdev);
+	mutex_unlock(&data->vdev->sriov_mutex);
+}
+
+/* pci_disable_sriov may block waiting for all the VF drivers to unload.
+ * As a result, If a process was allowed to call pci_disable_sriov()
+ * directly it could become unkillable by calling pci_disable_sriov()
+ * while holding a reference to one of the VFs.
+ * To address this issue we to the call to pci_disable_sriov
+ * on a kernel thread the can't hold references on the VFs.
+ */
+static inline int vfio_disable_sriov(struct vfio_pci_device *vdev)
+{
+	struct disable_sriov_data *data = kmalloc(sizeof(*data), GFP_USER);
+
+	if (!data)
+		return -ENOMEM;
+
+	INIT_WORK(&data->work, vfio_disable_sriov_work);
+	data->vdev = vdev;
+
+	schedule_work(&data->work);
+	return 0;
+}
+
+static int vfio_sriov_cap_config_write(struct vfio_pci_device *vdev, int pos,
+				       int count, struct perm_bits *perm,
+				       int offset, __le32 val)
+{
+	int ret;
+	int cap_start = vfio_find_cap_start(vdev, pos);
+	u16 sriov_ctrl = *(u16 *)(vdev->vconfig + cap_start + PCI_SRIOV_CTRL);
+	bool cur_vf_enabled = sriov_ctrl & PCI_SRIOV_CTRL_VFE;
+	bool vf_enabled;
+
+	switch (offset) {
+	case  PCI_SRIOV_NUM_VF:
+	/* Per SR-IOV spec sec 3.3.10 and 3.3.11, First VF Offset
+	 * and VF Stride may change when NumVFs changes.
+	 *
+	 * Therefore we should pass valid writes to the hardware.
+	 *
+	 * Per SR-IOV spec sec 3.3.7
+	 * The results are undefined if NumVFs is set to a value greater
+	 * than TotalVFs.
+	 * NumVFs may only be written while VF Enable is Clear.
+	 * If NumVFs is written when VF Enable is Set, the results
+	 * are undefined.
+
+	 * Avoid passing such writes to the Hardware just in case.
+	 */
+		if (cur_vf_enabled ||
+		    val > pci_sriov_get_totalvfs(vdev->pdev))
+			return count;
+
+		pci_iov_set_numvfs(vdev->pdev, val);
+		break;
+
+	case PCI_SRIOV_CTRL:
+		vf_enabled = val & PCI_SRIOV_CTRL_VFE;
+		ret = 0;
+
+		if (!cur_vf_enabled && vf_enabled) {
+			u16 num_vfs = *(u16 *)(vdev->vconfig +
+					cap_start +
+					PCI_SRIOV_NUM_VF);
+			mutex_lock(&vdev->sriov_mutex);
+			ret = pci_enable_sriov_with_override(vdev->pdev,
+							     num_vfs,
+							     "vfio-pci");
+			mutex_unlock(&vdev->sriov_mutex);
+		} else if (cur_vf_enabled && !vf_enabled) {
+			ret = vfio_disable_sriov(vdev);
+		}
+		if (ret)
+			return ret;
+		break;
+
+	default:
+		break;
+	}
+
+	return vfio_default_config_write(vdev, pos, count, perm,
+					 offset, val);
+}
+
 /*
  * Initialize the shared permission tables
  */
@@ -916,6 +1104,7 @@ void vfio_pci_uninit_perm_bits(void)
 
 	free_perm_bits(&ecap_perms[PCI_EXT_CAP_ID_ERR]);
 	free_perm_bits(&ecap_perms[PCI_EXT_CAP_ID_PWR]);
+	free_perm_bits(&ecap_perms[PCI_EXT_CAP_ID_SRIOV]);
 }
 
 int __init vfio_pci_init_perm_bits(void)
@@ -938,29 +1127,16 @@ int __init vfio_pci_init_perm_bits(void)
 	ret |= init_pci_ext_cap_pwr_perm(&ecap_perms[PCI_EXT_CAP_ID_PWR]);
 	ecap_perms[PCI_EXT_CAP_ID_VNDR].writefn = vfio_raw_config_write;
 
+	ret |= init_pci_ext_cap_sriov_perm(&ecap_perms[PCI_EXT_CAP_ID_SRIOV]);
+	ecap_perms[PCI_EXT_CAP_ID_SRIOV].readfn = vfio_sriov_cap_config_read;
+	ecap_perms[PCI_EXT_CAP_ID_SRIOV].writefn = vfio_sriov_cap_config_write;
+
 	if (ret)
 		vfio_pci_uninit_perm_bits();
 
 	return ret;
 }
 
-static int vfio_find_cap_start(struct vfio_pci_device *vdev, int pos)
-{
-	u8 cap;
-	int base = (pos >= PCI_CFG_SPACE_SIZE) ? PCI_CFG_SPACE_SIZE :
-						 PCI_STD_HEADER_SIZEOF;
-	cap = vdev->pci_config_map[pos];
-
-	if (cap == PCI_CAP_ID_BASIC)
-		return 0;
-
-	/* XXX Can we have to abutting capabilities of the same type? */
-	while (pos - 1 >= base && vdev->pci_config_map[pos - 1] == cap)
-		pos--;
-
-	return pos;
-}
-
 static int vfio_msi_config_read(struct vfio_pci_device *vdev, int pos,
 				int count, struct perm_bits *perm,
 				int offset, __le32 *val)
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index 016c14a..3be9a61 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -88,6 +88,7 @@ struct vfio_pci_device {
 	int			refcnt;
 	struct eventfd_ctx	*err_trigger;
 	struct eventfd_ctx	*req_trigger;
+	struct mutex		sriov_mutex;
 };
 
 #define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2] PCI: Extend PCI IOV API
  2016-06-19 12:16 ` [PATCH v2 1/2] PCI: Extend PCI IOV API Ilya Lesokhin
@ 2016-06-19 14:10   ` kbuild test robot
  0 siblings, 0 replies; 20+ messages in thread
From: kbuild test robot @ 2016-06-19 14:10 UTC (permalink / raw)
  To: Ilya Lesokhin
  Cc: kbuild-all, kvm, linux-pci, bhelgaas, alex.williamson, noaos,
	haggaie, ogerlitz, liranl, ilyal

[-- Attachment #1: Type: text/plain, Size: 3094 bytes --]

Hi,

[auto build test WARNING on pci/next]
[also build test WARNING on v4.7-rc3 next-20160617]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Ilya-Lesokhin/VFIO-SRIOV-support/20160619-204913
base:   https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git next
reproduce: make htmldocs

All warnings (new ones prefixed by >>):

   lib/crc32.c:148: warning: No description found for parameter 'tab)[256]'
   lib/crc32.c:148: warning: Excess function parameter 'tab' description in 'crc32_le_generic'
   lib/crc32.c:293: warning: No description found for parameter 'tab)[256]'
   lib/crc32.c:293: warning: Excess function parameter 'tab' description in 'crc32_be_generic'
   lib/crc32.c:1: warning: no structured comments found
   mm/memory.c:2881: warning: No description found for parameter 'old'
>> drivers/pci/iov.c:644: warning: No description found for parameter 'driver_override'

vim +/driver_override +644 drivers/pci/iov.c

4449f0797 Wei Yang       2015-03-25  628  			max = dev->sriov->max_VF_buses;
a28724b0f Yu Zhao        2009-03-20  629  	}
a28724b0f Yu Zhao        2009-03-20  630  
a28724b0f Yu Zhao        2009-03-20  631  	return max ? max - bus->number : 0;
a28724b0f Yu Zhao        2009-03-20  632  }
dd7cc44d0 Yu Zhao        2009-03-20  633  
dd7cc44d0 Yu Zhao        2009-03-20  634  /**
7aec631bf Ilya Lesokhin  2016-06-19  635   * pci_enable_sriov_with_override - enable the SR-IOV capability
dd7cc44d0 Yu Zhao        2009-03-20  636   * @dev: the PCI device
52a8873ba Randy Dunlap   2009-04-01  637   * @nr_virtfn: number of virtual functions to enable
7aec631bf Ilya Lesokhin  2016-06-19  638   * driver_override: driver override for VFs
dd7cc44d0 Yu Zhao        2009-03-20  639   *
dd7cc44d0 Yu Zhao        2009-03-20  640   * Returns 0 on success, or negative on failure.
dd7cc44d0 Yu Zhao        2009-03-20  641   */
7aec631bf Ilya Lesokhin  2016-06-19  642  int pci_enable_sriov_with_override(struct pci_dev *dev, int nr_virtfn,
7aec631bf Ilya Lesokhin  2016-06-19  643  				   char *driver_override)
dd7cc44d0 Yu Zhao        2009-03-20 @644  {
dd7cc44d0 Yu Zhao        2009-03-20  645  	might_sleep();
dd7cc44d0 Yu Zhao        2009-03-20  646  
dd7cc44d0 Yu Zhao        2009-03-20  647  	if (!dev->is_physfn)
652d11004 Stefan Assmann 2013-07-31  648  		return -ENOSYS;
dd7cc44d0 Yu Zhao        2009-03-20  649  
7aec631bf Ilya Lesokhin  2016-06-19  650  	return sriov_enable(dev, nr_virtfn, driver_override);
7aec631bf Ilya Lesokhin  2016-06-19  651  }
7aec631bf Ilya Lesokhin  2016-06-19  652  EXPORT_SYMBOL_GPL(pci_enable_sriov_with_override);

:::::: The code at line 644 was first introduced by commit
:::::: dd7cc44d0bcec5e9c42fe52e88dc254ae62eac8d PCI: add SR-IOV API for Physical Function driver

:::::: TO: Yu Zhao <yu.zhao@intel.com>
:::::: CC: Jesse Barnes <jbarnes@virtuousgeek.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 6370 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/2] VFIO: Add support for SR-IOV extended capablity
  2016-06-19 12:16 ` [PATCH v2 2/2] VFIO: Add support for SR-IOV extended capablity Ilya Lesokhin
@ 2016-06-19 23:07   ` kbuild test robot
  0 siblings, 0 replies; 20+ messages in thread
From: kbuild test robot @ 2016-06-19 23:07 UTC (permalink / raw)
  To: Ilya Lesokhin
  Cc: kbuild-all, kvm, linux-pci, bhelgaas, alex.williamson, noaos,
	haggaie, ogerlitz, liranl, ilyal

[-- Attachment #1: Type: text/plain, Size: 1858 bytes --]

Hi,

[auto build test ERROR on pci/next]
[also build test ERROR on v4.7-rc3 next-20160617]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Ilya-Lesokhin/VFIO-SRIOV-support/20160619-204913
base:   https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git next
config: i386-randconfig-h0-06200543 (attached as .config)
compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/vfio/pci/vfio_pci_config.c: In function 'vfio_sriov_bar_fixup':
>> drivers/vfio/pci/vfio_pci_config.c:461:11: error: 'PCI_IOV_RESOURCES' undeclared (first use in this function)
     for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++, bar++) {
              ^~~~~~~~~~~~~~~~~
   drivers/vfio/pci/vfio_pci_config.c:461:11: note: each undeclared identifier is reported only once for each function it appears in
>> drivers/vfio/pci/vfio_pci_config.c:461:35: error: 'PCI_IOV_RESOURCE_END' undeclared (first use in this function)
     for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++, bar++) {
                                      ^~~~~~~~~~~~~~~~~~~~

vim +/PCI_IOV_RESOURCES +461 drivers/vfio/pci/vfio_pci_config.c

   455		int i;
   456		__le32 *bar;
   457		u64 mask;
   458	
   459		bar = (__le32 *)&vdev->vconfig[sriov_cap_start + PCI_SRIOV_BAR];
   460	
 > 461		for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++, bar++) {
   462			if (!pci_resource_start(pdev, i)) {
   463				*bar = 0; /* Unmapped by host = unimplemented to user */
   464				continue;

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 27227 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/2] VFIO SRIOV support
  2016-06-19 12:16 [PATCH v2 0/2] VFIO SRIOV support Ilya Lesokhin
  2016-06-19 12:16 ` [PATCH v2 1/2] PCI: Extend PCI IOV API Ilya Lesokhin
  2016-06-19 12:16 ` [PATCH v2 2/2] VFIO: Add support for SR-IOV extended capablity Ilya Lesokhin
@ 2016-06-20 17:37 ` Alex Williamson
  2016-06-21  7:19   ` Ilya Lesokhin
  2 siblings, 1 reply; 20+ messages in thread
From: Alex Williamson @ 2016-06-20 17:37 UTC (permalink / raw)
  To: Ilya Lesokhin; +Cc: kvm, linux-pci, bhelgaas, noaos, haggaie, ogerlitz, liranl

On Sun, 19 Jun 2016 15:16:55 +0300
Ilya Lesokhin <ilyal@mellanox.com> wrote:

> Changes from V1:
> 	1. The VF are no longer assigned to PFs iommu group
> 	2. Add a pci_enable_sriov_with_override API to allow
> 	enablind sriov without probing the VFs with the
> 	default driver

So without the iommu grouping, but with the driver override, VFs are
created and bound to vfio-pci, and we just hope for the best from
there, right?  Is that reasonable?  That means that a user can be
granted access to a PF, which they can use to create VFs, which
automatically get bound to vfio-pci, but from there anything can
happen.  The user doesn't automatically get access to them.  Nothing
prevents the devices from being unbound from vfio-pci and bound to a
host driver, though it does require some admin intervention.  Nothing
prevents a user created VF from being granted to another user.
Shutdown seems dodgy as well, where does vfio-pci guarantee that a user
created VF is shutdown when the PF is released?  It seems vfio-pci
needs to gain some management of VFs, not just as a passthrough from
the user.  The whole idea still seems fragile and questionably a valid
thing we should do, to me.  Thanks,

Alex

 
> Changes from RFC V2:
>         1. pci_disable_sriov() is now called from a workqueue
>         To avoid the situation where a process is blocked
>         in pci_disable_sriov() wating for itself to relase the VFs.
>         2. a mutex was added to synchronize calls to
>         pci_enable_sriov() and pci_disable_sriov()
> 
> Changes from RFC V1:
>         Due to the security concern raised in RFC V1, we add two patches
>         to make sure the VFs belong to the same IOMMU group as
>         the PF and are probed by VFIO.
> 
> Today the QEMU hypervisor allows assigning a physical device to a VM,
> facilitating driver development. However, it does not support enabling
> SR-IOV by the VM kernel driver. Our goal is to implement such support,
> allowing developers working on SR-IOV physical function drivers to work
> inside VMs as well.
> 
> This patch series implements the kernel side of our solution.  It extends
> the VFIO driver to support the PCIE SRIOV extended capability with
> following features:
> 1. The ability to probe SR-IOV BAR sizes.
> 2. The ability to enable and disable SR-IOV.
> 
> This patch series is going to be used by QEMU to expose SR-IOV capabilities
> to VM. We already have an early prototype based on Knut Omang's patches for
> SR-IOV[1].
> 
> Limitations:
> 1. Per SR-IOV spec section 3.3.12, PFs are required to support
> 4-KB, 8-KB, 64-KB, 256-KB, 1-MB, and 4-MB page sizes.
> Unfourtently the kernel currently initializes the System Page Size register once
> and assumes it doesn't change therefore we cannot allow guests to change this
> register at will. We currently map both the Supported Page sizes and
> System Page Size as virtualized and read only in violation of the spec.
> In practice this is not an issue since both the hypervisor and the
> guest typically select the same System Page Size.
> 
> [1] https://github.com/knuto/qemu/tree/sriov_patches_v6
> 
> Ilya Lesokhin (2):
>   PCI: Extend PCI IOV API
>   VFIO: Add support for SR-IOV extended capablity
> 
>  drivers/pci/iov.c                   |  41 +++++--
>  drivers/vfio/pci/vfio_pci.c         |   1 +
>  drivers/vfio/pci/vfio_pci_config.c  | 210 +++++++++++++++++++++++++++++++++---
>  drivers/vfio/pci/vfio_pci_private.h |   1 +
>  include/linux/pci.h                 |  13 ++-
>  5 files changed, 240 insertions(+), 26 deletions(-)
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v2 0/2] VFIO SRIOV support
  2016-06-20 17:37 ` [PATCH v2 0/2] VFIO SRIOV support Alex Williamson
@ 2016-06-21  7:19   ` Ilya Lesokhin
  2016-06-21 15:45     ` Alex Williamson
  0 siblings, 1 reply; 20+ messages in thread
From: Ilya Lesokhin @ 2016-06-21  7:19 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, linux-pci, bhelgaas, Noa Osherovich, Haggai Eran,
	Or Gerlitz, Liran Liss

Why is the need for admin privileges not enough to make it safe?
What's the difference between the following?
	1. A PF is assigned to one user and the admin assigns one of it VFs to a different user.
	2. The admin assign the main network interface of the machine to a VM and bring down all the connectivity of the host.

Indeed we don't address clean up, but we didn't see any adverse effect from the VFs remaining probed by the VFIO driver after the VM is shutdown.

Would you be willing to accept this feature under some kconfig option or if it was allowed only for Privileged processes?
I would hate to throw this feature away just because we can't address every corner case.

Thanks,
Ilya

> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Monday, June 20, 2016 8:37 PM
> To: Ilya Lesokhin <ilyal@mellanox.com>
> Cc: kvm@vger.kernel.org; linux-pci@vger.kernel.org; bhelgaas@google.com;
> Noa Osherovich <noaos@mellanox.com>; Haggai Eran
> <haggaie@mellanox.com>; Or Gerlitz <ogerlitz@mellanox.com>; Liran Liss
> <liranl@mellanox.com>
> Subject: Re: [PATCH v2 0/2] VFIO SRIOV support
> 
> On Sun, 19 Jun 2016 15:16:55 +0300
> Ilya Lesokhin <ilyal@mellanox.com> wrote:
> 
> > Changes from V1:
> > 	1. The VF are no longer assigned to PFs iommu group
> > 	2. Add a pci_enable_sriov_with_override API to allow
> > 	enablind sriov without probing the VFs with the
> > 	default driver
> 
> So without the iommu grouping, but with the driver override, VFs are
> created and bound to vfio-pci, and we just hope for the best from there,
> right?  Is that reasonable?  That means that a user can be granted access to a
> PF, which they can use to create VFs, which automatically get bound to vfio-
> pci, but from there anything can happen.  The user doesn't automatically get
> access to them.  Nothing prevents the devices from being unbound from
> vfio-pci and bound to a host driver, though it does require some admin
> intervention.  Nothing prevents a user created VF from being granted to
> another user.
> Shutdown seems dodgy as well, where does vfio-pci guarantee that a user
> created VF is shutdown when the PF is released?  It seems vfio-pci needs to
> gain some management of VFs, not just as a passthrough from the user.  The
> whole idea still seems fragile and questionably a valid thing we should do, to
> me.  Thanks,
> 
> Alex
> 
> 
> > Changes from RFC V2:
> >         1. pci_disable_sriov() is now called from a workqueue
> >         To avoid the situation where a process is blocked
> >         in pci_disable_sriov() wating for itself to relase the VFs.
> >         2. a mutex was added to synchronize calls to
> >         pci_enable_sriov() and pci_disable_sriov()
> >
> > Changes from RFC V1:
> >         Due to the security concern raised in RFC V1, we add two patches
> >         to make sure the VFs belong to the same IOMMU group as
> >         the PF and are probed by VFIO.
> >
> > Today the QEMU hypervisor allows assigning a physical device to a VM,
> > facilitating driver development. However, it does not support enabling
> > SR-IOV by the VM kernel driver. Our goal is to implement such support,
> > allowing developers working on SR-IOV physical function drivers to
> > work inside VMs as well.
> >
> > This patch series implements the kernel side of our solution.  It
> > extends the VFIO driver to support the PCIE SRIOV extended capability
> > with following features:
> > 1. The ability to probe SR-IOV BAR sizes.
> > 2. The ability to enable and disable SR-IOV.
> >
> > This patch series is going to be used by QEMU to expose SR-IOV
> > capabilities to VM. We already have an early prototype based on Knut
> > Omang's patches for SR-IOV[1].
> >
> > Limitations:
> > 1. Per SR-IOV spec section 3.3.12, PFs are required to support 4-KB,
> > 8-KB, 64-KB, 256-KB, 1-MB, and 4-MB page sizes.
> > Unfourtently the kernel currently initializes the System Page Size
> > register once and assumes it doesn't change therefore we cannot allow
> > guests to change this register at will. We currently map both the
> > Supported Page sizes and System Page Size as virtualized and read only in
> violation of the spec.
> > In practice this is not an issue since both the hypervisor and the
> > guest typically select the same System Page Size.
> >
> > [1] https://github.com/knuto/qemu/tree/sriov_patches_v6
> >
> > Ilya Lesokhin (2):
> >   PCI: Extend PCI IOV API
> >   VFIO: Add support for SR-IOV extended capablity
> >
> >  drivers/pci/iov.c                   |  41 +++++--
> >  drivers/vfio/pci/vfio_pci.c         |   1 +
> >  drivers/vfio/pci/vfio_pci_config.c  | 210
> +++++++++++++++++++++++++++++++++---
> >  drivers/vfio/pci/vfio_pci_private.h |   1 +
> >  include/linux/pci.h                 |  13 ++-
> >  5 files changed, 240 insertions(+), 26 deletions(-)
> >


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/2] VFIO SRIOV support
  2016-06-21  7:19   ` Ilya Lesokhin
@ 2016-06-21 15:45     ` Alex Williamson
  2016-07-14 14:53       ` Ilya Lesokhin
  0 siblings, 1 reply; 20+ messages in thread
From: Alex Williamson @ 2016-06-21 15:45 UTC (permalink / raw)
  To: Ilya Lesokhin
  Cc: kvm, linux-pci, bhelgaas, Noa Osherovich, Haggai Eran,
	Or Gerlitz, Liran Liss

On Tue, 21 Jun 2016 07:19:14 +0000
Ilya Lesokhin <ilyal@mellanox.com> wrote:

> Why is the need for admin privileges not enough to make it safe?

Seems like an opportunity for a phishing attempt, an exploited VM
generates VFs and hopes that they get assigned to other VMs or put to
other uses.  It simply allows one VM to create resources that are
really not tied to that VM and can easily be mis-used in other ways,
potentially even with the assumption that it's secure.

> What's the difference between the following?
> 	1. A PF is assigned to one user and the admin assigns one of it VFs to a different user.
> 	2. The admin assign the main network interface of the machine to a VM and bring down all the connectivity of the host.

In case 1. does the admin realize what they've done?  How?  Maybe they
just start filing bugs when they shutdown the PF assigned VM or even
just reboot it and the VF assigned VM no longer has connectivity.  A  
typical user might just think "hey cool, now I can assign PFs and VFs"
w/o considering the implications of that.  There are aspects of
the host that we need to trust, allowing another VM to hold some
of that trust is not such an obvious thing.  Clearly in case 2. the
admin is going to pretty quickly figure out that they've done something
terribly wrong.
 
> Indeed we don't address clean up, but we didn't see any adverse effect from the VFs remaining probed by the VFIO driver after the VM is shutdown.

Yet that's clearly not the state the PF was in when assigned and those
VFs have no cleanup mechanism.  This is a big problem.

> Would you be willing to accept this feature under some kconfig option or if it was allowed only for Privileged processes?
> I would hate to throw this feature away just because we can't address every corner case.

I'm not willing to accept half baked features and letting them bitrot
under a config option that doesn't get regular use doesn't help.  If
this is really that useful to you then make it safe and predictable.  I
don't have answers to all the issues this generates, it might require
extensions to driver APIs or new mechanisms for vfio to associate device
ownership.  There might be valid use cases for a user owned PF sourcing
VFs owned by other users, but obviously to me it sounds like a rats
nest of security issues and "because we can" and "useful development
tool" are not resounding motives for me to sign up trying to support
it.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v2 0/2] VFIO SRIOV support
  2016-06-21 15:45     ` Alex Williamson
@ 2016-07-14 14:53       ` Ilya Lesokhin
  2016-07-14 17:03         ` Alex Williamson
  0 siblings, 1 reply; 20+ messages in thread
From: Ilya Lesokhin @ 2016-07-14 14:53 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, linux-pci, bhelgaas, Noa Osherovich, Haggai Eran,
	Or Gerlitz, Liran Liss

Hi Alex,

Regarding the security model, I still think the need for admin privileges is enough,
But Since you disagree, we can go with one of the following solutions:
1. Go back to the model where the VFs are assigned to the same IOMMU group as the PF.
2. Add an owner_pid to struct vfio_group and make sure in vfio_group_get_device_fd that
the PFs  vfio_group is owned by the same process as the one that is trying to get a fd for a VF.

Will you accept either of this solutions?

Regarding the cleanup, I can fix it with a small bug fix
+ a call to pci_disable_sriov inside vfio_pci_disable.

Thanks,
Ilya
> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Tuesday, June 21, 2016 6:46 PM
> To: Ilya Lesokhin <ilyal@mellanox.com>
> Cc: kvm@vger.kernel.org; linux-pci@vger.kernel.org; bhelgaas@google.com;
> Noa Osherovich <noaos@mellanox.com>; Haggai Eran
> <haggaie@mellanox.com>; Or Gerlitz <ogerlitz@mellanox.com>; Liran Liss
> <liranl@mellanox.com>
> Subject: Re: [PATCH v2 0/2] VFIO SRIOV support
> 
> On Tue, 21 Jun 2016 07:19:14 +0000
> Ilya Lesokhin <ilyal@mellanox.com> wrote:
> 
> > Why is the need for admin privileges not enough to make it safe?
> 
> Seems like an opportunity for a phishing attempt, an exploited VM generates
> VFs and hopes that they get assigned to other VMs or put to other uses.  It
> simply allows one VM to create resources that are really not tied to that VM
> and can easily be mis-used in other ways, potentially even with the
> assumption that it's secure.
> 
> > What's the difference between the following?
> > 	1. A PF is assigned to one user and the admin assigns one of it VFs to
> a different user.
> > 	2. The admin assign the main network interface of the machine to a
> VM and bring down all the connectivity of the host.
> 
> In case 1. does the admin realize what they've done?  How?  Maybe they just
> start filing bugs when they shutdown the PF assigned VM or even just reboot
> it and the VF assigned VM no longer has connectivity.  A typical user might
> just think "hey cool, now I can assign PFs and VFs"
> w/o considering the implications of that.  There are aspects of the host that
> we need to trust, allowing another VM to hold some of that trust is not such
> an obvious thing.  Clearly in case 2. the admin is going to pretty quickly figure
> out that they've done something terribly wrong.
> 
> > Indeed we don't address clean up, but we didn't see any adverse effect
> from the VFs remaining probed by the VFIO driver after the VM is shutdown.
> 
> Yet that's clearly not the state the PF was in when assigned and those VFs
> have no cleanup mechanism.  This is a big problem.
> 
> > Would you be willing to accept this feature under some kconfig option or if
> it was allowed only for Privileged processes?
> > I would hate to throw this feature away just because we can't address
> every corner case.
> 
> I'm not willing to accept half baked features and letting them bitrot under a
> config option that doesn't get regular use doesn't help.  If this is really that
> useful to you then make it safe and predictable.  I don't have answers to all
> the issues this generates, it might require extensions to driver APIs or new
> mechanisms for vfio to associate device ownership.  There might be valid use
> cases for a user owned PF sourcing VFs owned by other users, but obviously
> to me it sounds like a rats nest of security issues and "because we can" and
> "useful development tool" are not resounding motives for me to sign up
> trying to support it.  Thanks,
> 
> Alex

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/2] VFIO SRIOV support
  2016-07-14 14:53       ` Ilya Lesokhin
@ 2016-07-14 17:03         ` Alex Williamson
  2016-07-17 10:05           ` Haggai Eran
  0 siblings, 1 reply; 20+ messages in thread
From: Alex Williamson @ 2016-07-14 17:03 UTC (permalink / raw)
  To: Ilya Lesokhin
  Cc: kvm, linux-pci, bhelgaas, Noa Osherovich, Haggai Eran,
	Or Gerlitz, Liran Liss

On Thu, 14 Jul 2016 14:53:10 +0000
Ilya Lesokhin <ilyal@mellanox.com> wrote:

> Hi Alex,
> 
> Regarding the security model, I still think the need for admin privileges is enough,
> But Since you disagree, we can go with one of the following solutions:
> 1. Go back to the model where the VFs are assigned to the same IOMMU group as the PF.

This solves the problem but it's not useful.  VFs could only be used in
the same container as the PF, which eliminates one of the more wildly
used use cases of the VF.  Why bother?

> 2. Add an owner_pid to struct vfio_group and make sure in vfio_group_get_device_fd that
> the PFs  vfio_group is owned by the same process as the one that is trying to get a fd for a VF.

This only solves a very specific use case, it doesn't address any of
the issues where the VF struct device in the host kernel might get
bound to another driver.  Also is the pid really what we want to attach
ownership to?  What if the owner of the PF wants to assign the VF to a
peer VM, not to a nested VM?  It seems like there's an entire aspect of
owning and being able to grant ownership of a device to a user or group
missing.

> Will you accept either of this solutions?

I don't see choice 1 as being useful and 2 still leaves significant
holes.  Thanks,

Alex

> Regarding the cleanup, I can fix it with a small bug fix
> + a call to pci_disable_sriov inside vfio_pci_disable.
> 
> Thanks,
> Ilya
> > -----Original Message-----
> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > Sent: Tuesday, June 21, 2016 6:46 PM
> > To: Ilya Lesokhin <ilyal@mellanox.com>
> > Cc: kvm@vger.kernel.org; linux-pci@vger.kernel.org; bhelgaas@google.com;
> > Noa Osherovich <noaos@mellanox.com>; Haggai Eran
> > <haggaie@mellanox.com>; Or Gerlitz <ogerlitz@mellanox.com>; Liran Liss
> > <liranl@mellanox.com>
> > Subject: Re: [PATCH v2 0/2] VFIO SRIOV support
> > 
> > On Tue, 21 Jun 2016 07:19:14 +0000
> > Ilya Lesokhin <ilyal@mellanox.com> wrote:
> >   
> > > Why is the need for admin privileges not enough to make it safe?  
> > 
> > Seems like an opportunity for a phishing attempt, an exploited VM generates
> > VFs and hopes that they get assigned to other VMs or put to other uses.  It
> > simply allows one VM to create resources that are really not tied to that VM
> > and can easily be mis-used in other ways, potentially even with the
> > assumption that it's secure.
> >   
> > > What's the difference between the following?
> > > 	1. A PF is assigned to one user and the admin assigns one of it VFs to  
> > a different user.  
> > > 	2. The admin assign the main network interface of the machine to a  
> > VM and bring down all the connectivity of the host.
> > 
> > In case 1. does the admin realize what they've done?  How?  Maybe they just
> > start filing bugs when they shutdown the PF assigned VM or even just reboot
> > it and the VF assigned VM no longer has connectivity.  A typical user might
> > just think "hey cool, now I can assign PFs and VFs"
> > w/o considering the implications of that.  There are aspects of the host that
> > we need to trust, allowing another VM to hold some of that trust is not such
> > an obvious thing.  Clearly in case 2. the admin is going to pretty quickly figure
> > out that they've done something terribly wrong.
> >   
> > > Indeed we don't address clean up, but we didn't see any adverse effect  
> > from the VFs remaining probed by the VFIO driver after the VM is shutdown.
> > 
> > Yet that's clearly not the state the PF was in when assigned and those VFs
> > have no cleanup mechanism.  This is a big problem.
> >   
> > > Would you be willing to accept this feature under some kconfig option or if  
> > it was allowed only for Privileged processes?  
> > > I would hate to throw this feature away just because we can't address  
> > every corner case.
> > 
> > I'm not willing to accept half baked features and letting them bitrot under a
> > config option that doesn't get regular use doesn't help.  If this is really that
> > useful to you then make it safe and predictable.  I don't have answers to all
> > the issues this generates, it might require extensions to driver APIs or new
> > mechanisms for vfio to associate device ownership.  There might be valid use
> > cases for a user owned PF sourcing VFs owned by other users, but obviously
> > to me it sounds like a rats nest of security issues and "because we can" and
> > "useful development tool" are not resounding motives for me to sign up
> > trying to support it.  Thanks,
> > 
> > Alex  


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/2] VFIO SRIOV support
  2016-07-14 17:03         ` Alex Williamson
@ 2016-07-17 10:05           ` Haggai Eran
  2016-07-18 21:34             ` Alex Williamson
  0 siblings, 1 reply; 20+ messages in thread
From: Haggai Eran @ 2016-07-17 10:05 UTC (permalink / raw)
  To: Alex Williamson, Ilya Lesokhin
  Cc: kvm, linux-pci, bhelgaas, Noa Osherovich, Or Gerlitz, Liran Liss

On 7/14/2016 8:03 PM, Alex Williamson wrote:
>> 2. Add an owner_pid to struct vfio_group and make sure in vfio_group_get_device_fd that
>> > the PFs  vfio_group is owned by the same process as the one that is trying to get a fd for a VF.
> This only solves a very specific use case, it doesn't address any of
> the issues where the VF struct device in the host kernel might get
> bound to another driver.  
The current patch uses driver_override to make the kernel use VFIO for
all the new VFs. It still allows the host kernel to bind them to another
driver, but that would require an explicit action on the part of the
administrator. Don't you think that is enough? Do you think we should
also take a reference on the VFIO devices to prevent an administrator
from unbinding them?

> Also is the pid really what we want to attach
> ownership to?  What if the owner of the PF wants to assign the VF to a
> peer VM, not to a nested VM?  It seems like there's an entire aspect of
> owning and being able to grant ownership of a device to a user or group
> missing.
In order to support assigning to a peer VM, maybe we can do something a
little different. What if we add a ioctl to enable SR-IOV, and the new
ioctl would return an open fd for the VFIO group of each VF? Other
attempts to open these groups by other processes would fail because the
groups would already be open. The process that enabled SR-IOV could
still pass these fds to other processes if needed, allowing other VMs to
use them.

What do you think?

Haggai


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/2] VFIO SRIOV support
  2016-07-17 10:05           ` Haggai Eran
@ 2016-07-18 21:34             ` Alex Williamson
  2016-07-19  7:06               ` Tian, Kevin
  0 siblings, 1 reply; 20+ messages in thread
From: Alex Williamson @ 2016-07-18 21:34 UTC (permalink / raw)
  To: Haggai Eran
  Cc: Ilya Lesokhin, kvm, linux-pci, bhelgaas, Noa Osherovich,
	Or Gerlitz, Liran Liss

On Sun, 17 Jul 2016 13:05:21 +0300
Haggai Eran <haggaie@mellanox.com> wrote:

> On 7/14/2016 8:03 PM, Alex Williamson wrote:
> >> 2. Add an owner_pid to struct vfio_group and make sure in vfio_group_get_device_fd that  
> >> > the PFs  vfio_group is owned by the same process as the one that is trying to get a fd for a VF.  
> > This only solves a very specific use case, it doesn't address any of
> > the issues where the VF struct device in the host kernel might get
> > bound to another driver.    
> The current patch uses driver_override to make the kernel use VFIO for
> all the new VFs. It still allows the host kernel to bind them to another
> driver, but that would require an explicit action on the part of the
> administrator. Don't you think that is enough?

Binding the VFs to vfio-pci with driver_override just prevents any sort
of initial use by native host drivers, it doesn't in any way tie them to
the user that created them or prevent any normal operations on the
device.  The entire concept of a user-created device is new and
entirely separate from a user-owned device as typically used with
vfio.  We currently have an assumption with VF assignment that the PF
is trusted in the host, that's broken here and I have a hard time
blaming it on the admin or management tool for allowing such a thing
when it previously hasn't been a possibility.  If nothing else, it
seems like we're opening the system for phishing attempts where a user
of a PF creates VFs hoping they might be assigned to a victim VM, or
worse the host.

> Do you think we should
> also take a reference on the VFIO devices to prevent an administrator
> from unbinding them?

Who would be taking this reference?  It would need to be the host
kernel, we couldn't rely on the user to do it.  Wouldn't we be racing
other users to get that reference?  How would we bestow that reference
to a user?  How does this lead to a VF that can actually be used?

> > Also is the pid really what we want to attach
> > ownership to?  What if the owner of the PF wants to assign the VF to a
> > peer VM, not to a nested VM?  It seems like there's an entire aspect of
> > owning and being able to grant ownership of a device to a user or group
> > missing.  
> In order to support assigning to a peer VM, maybe we can do something a
> little different. What if we add a ioctl to enable SR-IOV, and the new
> ioctl would return an open fd for the VFIO group of each VF? Other
> attempts to open these groups by other processes would fail because the
> groups would already be open. The process that enabled SR-IOV could
> still pass these fds to other processes if needed, allowing other VMs to
> use them.

It's a good thought, but what happens when the user simply closes the VF
group(s)?  Are we back to a state that's any different from above?  We
cannot base the security of the host on the assumption that our user is
not malicious.  I'm also not sure how that enables any interesting use
cases.  One QEMU instance cannot spawn another, so what good is it if
QEMU has a group fd open?  Seems like that only facilitates exposing
the VF within the same process, but without understanding how close()
works, I don't see what that adds.

I really don't know what the ownership model looks like for
user-created devices and how we allow that ownership to be transferred
to other contexts, perhaps even back to a system context, but handling
them as just another device, perhaps only with an initial driver
binding is a scary proposition to me.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v2 0/2] VFIO SRIOV support
  2016-07-18 21:34             ` Alex Williamson
@ 2016-07-19  7:06               ` Tian, Kevin
  2016-07-19 15:10                 ` Alex Williamson
  0 siblings, 1 reply; 20+ messages in thread
From: Tian, Kevin @ 2016-07-19  7:06 UTC (permalink / raw)
  To: Alex Williamson, Haggai Eran
  Cc: Ilya Lesokhin, kvm, linux-pci, bhelgaas, Noa Osherovich,
	Or Gerlitz, Liran Liss

> From: Alex Williamson
> Sent: Tuesday, July 19, 2016 5:34 AM
> 
> On Sun, 17 Jul 2016 13:05:21 +0300
> Haggai Eran <haggaie@mellanox.com> wrote:
> 
> > On 7/14/2016 8:03 PM, Alex Williamson wrote:
> > >> 2. Add an owner_pid to struct vfio_group and make sure in
> vfio_group_get_device_fd that
> > >> > the PFs  vfio_group is owned by the same process as the one that is trying to get
> a fd for a VF.
> > > This only solves a very specific use case, it doesn't address any of
> > > the issues where the VF struct device in the host kernel might get
> > > bound to another driver.
> > The current patch uses driver_override to make the kernel use VFIO for
> > all the new VFs. It still allows the host kernel to bind them to another
> > driver, but that would require an explicit action on the part of the
> > administrator. Don't you think that is enough?
> 
> Binding the VFs to vfio-pci with driver_override just prevents any sort
> of initial use by native host drivers, it doesn't in any way tie them to
> the user that created them or prevent any normal operations on the
> device.  The entire concept of a user-created device is new and
> entirely separate from a user-owned device as typically used with
> vfio.  We currently have an assumption with VF assignment that the PF
> is trusted in the host, that's broken here and I have a hard time
> blaming it on the admin or management tool for allowing such a thing
> when it previously hasn't been a possibility.  If nothing else, it
> seems like we're opening the system for phishing attempts where a user
> of a PF creates VFs hoping they might be assigned to a victim VM, or
> worse the host.
> 

What about fully virtualizing the SR-IOV capability? The VM is not allowed
to touch physical SR-IOV capability directly so there would not be a problem
of user-created devices. Physical SR-IOV is always enabled by admin at
the host side. Admin can combine any number of VFs (even cross multiple
compatible devices) in the virtual SR-IOV capability on any passthrough
device...

The limitation is that the VM can initially access only PF resource which is 
usually less than what the entire device provides, so not that efficient
when the VM doesn't want to enable SR-IOV at all.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/2] VFIO SRIOV support
  2016-07-19  7:06               ` Tian, Kevin
@ 2016-07-19 15:10                 ` Alex Williamson
  2016-07-19 19:43                   ` Alex Williamson
  0 siblings, 1 reply; 20+ messages in thread
From: Alex Williamson @ 2016-07-19 15:10 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Haggai Eran, Ilya Lesokhin, kvm, linux-pci, bhelgaas,
	Noa Osherovich, Or Gerlitz, Liran Liss

On Tue, 19 Jul 2016 07:06:34 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Alex Williamson
> > Sent: Tuesday, July 19, 2016 5:34 AM
> > 
> > On Sun, 17 Jul 2016 13:05:21 +0300
> > Haggai Eran <haggaie@mellanox.com> wrote:
> >   
> > > On 7/14/2016 8:03 PM, Alex Williamson wrote:  
> > > >> 2. Add an owner_pid to struct vfio_group and make sure in  
> > vfio_group_get_device_fd that  
> > > >> > the PFs  vfio_group is owned by the same process as the one that is trying to get  
> > a fd for a VF.  
> > > > This only solves a very specific use case, it doesn't address any of
> > > > the issues where the VF struct device in the host kernel might get
> > > > bound to another driver.  
> > > The current patch uses driver_override to make the kernel use VFIO for
> > > all the new VFs. It still allows the host kernel to bind them to another
> > > driver, but that would require an explicit action on the part of the
> > > administrator. Don't you think that is enough?  
> > 
> > Binding the VFs to vfio-pci with driver_override just prevents any sort
> > of initial use by native host drivers, it doesn't in any way tie them to
> > the user that created them or prevent any normal operations on the
> > device.  The entire concept of a user-created device is new and
> > entirely separate from a user-owned device as typically used with
> > vfio.  We currently have an assumption with VF assignment that the PF
> > is trusted in the host, that's broken here and I have a hard time
> > blaming it on the admin or management tool for allowing such a thing
> > when it previously hasn't been a possibility.  If nothing else, it
> > seems like we're opening the system for phishing attempts where a user
> > of a PF creates VFs hoping they might be assigned to a victim VM, or
> > worse the host.
> >   
> 
> What about fully virtualizing the SR-IOV capability? The VM is not allowed
> to touch physical SR-IOV capability directly so there would not be a problem
> of user-created devices. Physical SR-IOV is always enabled by admin at
> the host side. Admin can combine any number of VFs (even cross multiple
> compatible devices) in the virtual SR-IOV capability on any passthrough
> device...
> 
> The limitation is that the VM can initially access only PF resource which is 
> usually less than what the entire device provides, so not that efficient
> when the VM doesn't want to enable SR-IOV at all.

Are you suggesting a scenario where we have one PF with SR-IOV disabled
assigned to the user and another PF owned by the host with SR-IOV
enable, we virtualize SR-IOV to the user and use the VFs from the other
PF to act as a "pool" of VFs to be exposed to the user depending on
SR-IOV manipulation?  Something like that could work with existing
vfio, just requiring the QEMU bits to virtualize SR-IOV and mange the
VFs, but I expect it's not a useful model for Mellanox.  I believe it
was Ilya that stated the purpose in exposing SR-IOV was for
development, so I'm assuming they actually want to do development of
the PF SR-IOV enabling in a VM, not just give the illusion of SR-IOV to
the VM.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/2] VFIO SRIOV support
  2016-07-19 15:10                 ` Alex Williamson
@ 2016-07-19 19:43                   ` Alex Williamson
  2016-07-21  5:51                     ` Tian, Kevin
  2016-07-25  7:53                     ` Haggai Eran
  0 siblings, 2 replies; 20+ messages in thread
From: Alex Williamson @ 2016-07-19 19:43 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Haggai Eran, Ilya Lesokhin, kvm, linux-pci, bhelgaas,
	Noa Osherovich, Or Gerlitz, Liran Liss

On Tue, 19 Jul 2016 09:10:17 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Tue, 19 Jul 2016 07:06:34 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> > > From: Alex Williamson
> > > Sent: Tuesday, July 19, 2016 5:34 AM
> > > 
> > > On Sun, 17 Jul 2016 13:05:21 +0300
> > > Haggai Eran <haggaie@mellanox.com> wrote:
> > >     
> > > > On 7/14/2016 8:03 PM, Alex Williamson wrote:    
> > > > >> 2. Add an owner_pid to struct vfio_group and make sure in    
> > > vfio_group_get_device_fd that    
> > > > >> > the PFs  vfio_group is owned by the same process as the one that is trying to get    
> > > a fd for a VF.    
> > > > > This only solves a very specific use case, it doesn't address any of
> > > > > the issues where the VF struct device in the host kernel might get
> > > > > bound to another driver.    
> > > > The current patch uses driver_override to make the kernel use VFIO for
> > > > all the new VFs. It still allows the host kernel to bind them to another
> > > > driver, but that would require an explicit action on the part of the
> > > > administrator. Don't you think that is enough?    
> > > 
> > > Binding the VFs to vfio-pci with driver_override just prevents any sort
> > > of initial use by native host drivers, it doesn't in any way tie them to
> > > the user that created them or prevent any normal operations on the
> > > device.  The entire concept of a user-created device is new and
> > > entirely separate from a user-owned device as typically used with
> > > vfio.  We currently have an assumption with VF assignment that the PF
> > > is trusted in the host, that's broken here and I have a hard time
> > > blaming it on the admin or management tool for allowing such a thing
> > > when it previously hasn't been a possibility.  If nothing else, it
> > > seems like we're opening the system for phishing attempts where a user
> > > of a PF creates VFs hoping they might be assigned to a victim VM, or
> > > worse the host.
> > >     
> > 
> > What about fully virtualizing the SR-IOV capability? The VM is not allowed
> > to touch physical SR-IOV capability directly so there would not be a problem
> > of user-created devices. Physical SR-IOV is always enabled by admin at
> > the host side. Admin can combine any number of VFs (even cross multiple
> > compatible devices) in the virtual SR-IOV capability on any passthrough
> > device...
> > 
> > The limitation is that the VM can initially access only PF resource which is 
> > usually less than what the entire device provides, so not that efficient
> > when the VM doesn't want to enable SR-IOV at all.  
> 
> Are you suggesting a scenario where we have one PF with SR-IOV disabled
> assigned to the user and another PF owned by the host with SR-IOV
> enable, we virtualize SR-IOV to the user and use the VFs from the other
> PF to act as a "pool" of VFs to be exposed to the user depending on
> SR-IOV manipulation?  Something like that could work with existing
> vfio, just requiring the QEMU bits to virtualize SR-IOV and mange the
> VFs, but I expect it's not a useful model for Mellanox.  I believe it
> was Ilya that stated the purpose in exposing SR-IOV was for
> development, so I'm assuming they actually want to do development of
> the PF SR-IOV enabling in a VM, not just give the illusion of SR-IOV to
> the VM.  Thanks,


Thinking about this further, it seems that trying to create this IOV
enablement interface through a channel which is explicitly designed to
interact with an untrusted and potentially malicious user is the wrong
approach.  We already have an interface for a trusted entity to enable
VFs, it's through pci-sysfs.  Therefore if we were to use something like
libvirt to orchestrate the lifecycle of the VFs, I think we remove a
lot of the problems.  In this case QEMU would virtualize the SR-IOV
capability (maybe this is along the lines of what Kevin was thinking),
but that virtualization would take a path out through the QEMU QMP
interface to execute the SR-IOV change on the device rather than going
through the vfio kernel interface.  A management tool like libvirt
would then need to translate that into sysfs operations to create the
VFs and do whatever we're going to do with them (device_add them back
to the VM, make them available to a peer VM, make them available to
the host *gasp*).  VFIO in the kernel would need to add SR-IOV
support, but the only automatic SR-IOV path would be to disable IOV
when the PF is released, enabling would only occur through sysfs.  We
would probably need a new pci-sysfs interface to manage the driver for
newly created VFs though to avoid default host drivers
(sriov_driver_override?).  In this model QEMU is essentially just
making requests to other userspace entities to perform actions and how
those actions are performed can be left to userspace policy, not kernel
policy.  I think this would still satisfy the development use case, the
enabling path just takes a different route where privileged userspace
is more intimately involved in the process.  Thoughts?  Thanks,

Alex

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v2 0/2] VFIO SRIOV support
  2016-07-19 19:43                   ` Alex Williamson
@ 2016-07-21  5:51                     ` Tian, Kevin
  2016-07-25  7:53                     ` Haggai Eran
  1 sibling, 0 replies; 20+ messages in thread
From: Tian, Kevin @ 2016-07-21  5:51 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Haggai Eran, Ilya Lesokhin, kvm, linux-pci, bhelgaas,
	Noa Osherovich, Or Gerlitz, Liran Liss

> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Wednesday, July 20, 2016 3:44 AM
> 
> On Tue, 19 Jul 2016 09:10:17 -0600
> Alex Williamson <alex.williamson@redhat.com> wrote:
> 
> > On Tue, 19 Jul 2016 07:06:34 +0000
> > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >
> > > > From: Alex Williamson
> > > > Sent: Tuesday, July 19, 2016 5:34 AM
> > > >
> > > > On Sun, 17 Jul 2016 13:05:21 +0300
> > > > Haggai Eran <haggaie@mellanox.com> wrote:
> > > >
> > > > > On 7/14/2016 8:03 PM, Alex Williamson wrote:
> > > > > >> 2. Add an owner_pid to struct vfio_group and make sure in
> > > > vfio_group_get_device_fd that
> > > > > >> > the PFs  vfio_group is owned by the same process as the one that is trying
> to get
> > > > a fd for a VF.
> > > > > > This only solves a very specific use case, it doesn't address any of
> > > > > > the issues where the VF struct device in the host kernel might get
> > > > > > bound to another driver.
> > > > > The current patch uses driver_override to make the kernel use VFIO for
> > > > > all the new VFs. It still allows the host kernel to bind them to another
> > > > > driver, but that would require an explicit action on the part of the
> > > > > administrator. Don't you think that is enough?
> > > >
> > > > Binding the VFs to vfio-pci with driver_override just prevents any sort
> > > > of initial use by native host drivers, it doesn't in any way tie them to
> > > > the user that created them or prevent any normal operations on the
> > > > device.  The entire concept of a user-created device is new and
> > > > entirely separate from a user-owned device as typically used with
> > > > vfio.  We currently have an assumption with VF assignment that the PF
> > > > is trusted in the host, that's broken here and I have a hard time
> > > > blaming it on the admin or management tool for allowing such a thing
> > > > when it previously hasn't been a possibility.  If nothing else, it
> > > > seems like we're opening the system for phishing attempts where a user
> > > > of a PF creates VFs hoping they might be assigned to a victim VM, or
> > > > worse the host.
> > > >
> > >
> > > What about fully virtualizing the SR-IOV capability? The VM is not allowed
> > > to touch physical SR-IOV capability directly so there would not be a problem
> > > of user-created devices. Physical SR-IOV is always enabled by admin at
> > > the host side. Admin can combine any number of VFs (even cross multiple
> > > compatible devices) in the virtual SR-IOV capability on any passthrough
> > > device...
> > >
> > > The limitation is that the VM can initially access only PF resource which is
> > > usually less than what the entire device provides, so not that efficient
> > > when the VM doesn't want to enable SR-IOV at all.
> >
> > Are you suggesting a scenario where we have one PF with SR-IOV disabled
> > assigned to the user and another PF owned by the host with SR-IOV
> > enable, we virtualize SR-IOV to the user and use the VFs from the other
> > PF to act as a "pool" of VFs to be exposed to the user depending on
> > SR-IOV manipulation?  Something like that could work with existing
> > vfio, just requiring the QEMU bits to virtualize SR-IOV and mange the
> > VFs, but I expect it's not a useful model for Mellanox.  I believe it
> > was Ilya that stated the purpose in exposing SR-IOV was for
> > development, so I'm assuming they actually want to do development of
> > the PF SR-IOV enabling in a VM, not just give the illusion of SR-IOV to
> > the VM.  Thanks,
> 
> 
> Thinking about this further, it seems that trying to create this IOV
> enablement interface through a channel which is explicitly designed to
> interact with an untrusted and potentially malicious user is the wrong
> approach.  We already have an interface for a trusted entity to enable
> VFs, it's through pci-sysfs.  Therefore if we were to use something like
> libvirt to orchestrate the lifecycle of the VFs, I think we remove a
> lot of the problems.  In this case QEMU would virtualize the SR-IOV
> capability (maybe this is along the lines of what Kevin was thinking),
> but that virtualization would take a path out through the QEMU QMP
> interface to execute the SR-IOV change on the device rather than going
> through the vfio kernel interface.  A management tool like libvirt
> would then need to translate that into sysfs operations to create the
> VFs and do whatever we're going to do with them (device_add them back
> to the VM, make them available to a peer VM, make them available to
> the host *gasp*).  VFIO in the kernel would need to add SR-IOV
> support, but the only automatic SR-IOV path would be to disable IOV
> when the PF is released, enabling would only occur through sysfs.  We
> would probably need a new pci-sysfs interface to manage the driver for
> newly created VFs though to avoid default host drivers
> (sriov_driver_override?).  In this model QEMU is essentially just
> making requests to other userspace entities to perform actions and how
> those actions are performed can be left to userspace policy, not kernel
> policy.  I think this would still satisfy the development use case, the
> enabling path just takes a different route where privileged userspace
> is more intimately involved in the process.  Thoughts?  Thanks,
> 

A sound direction to me. Thanks

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/2] VFIO SRIOV support
  2016-07-19 19:43                   ` Alex Williamson
  2016-07-21  5:51                     ` Tian, Kevin
@ 2016-07-25  7:53                     ` Haggai Eran
  2016-07-25 15:07                       ` Alex Williamson
  1 sibling, 1 reply; 20+ messages in thread
From: Haggai Eran @ 2016-07-25  7:53 UTC (permalink / raw)
  To: Alex Williamson, Tian, Kevin
  Cc: Ilya Lesokhin, kvm, linux-pci, bhelgaas, Noa Osherovich,
	Or Gerlitz, Liran Liss

On 7/19/2016 10:43 PM, Alex Williamson wrote:
> Thinking about this further, it seems that trying to create this IOV
> enablement interface through a channel which is explicitly designed to
> interact with an untrusted and potentially malicious user is the wrong
> approach.  We already have an interface for a trusted entity to enable
> VFs, it's through pci-sysfs.  Therefore if we were to use something like
> libvirt to orchestrate the lifecycle of the VFs, I think we remove a
> lot of the problems.  In this case QEMU would virtualize the SR-IOV
> capability (maybe this is along the lines of what Kevin was thinking),
> but that virtualization would take a path out through the QEMU QMP
> interface to execute the SR-IOV change on the device rather than going
> through the vfio kernel interface.  A management tool like libvirt
> would then need to translate that into sysfs operations to create the
> VFs and do whatever we're going to do with them (device_add them back
> to the VM, make them available to a peer VM, make them available to
> the host *gasp*).  VFIO in the kernel would need to add SR-IOV
> support, but the only automatic SR-IOV path would be to disable IOV
> when the PF is released, enabling would only occur through sysfs.  We
> would probably need a new pci-sysfs interface to manage the driver for
> newly created VFs though to avoid default host drivers
> (sriov_driver_override?).  In this model QEMU is essentially just
> making requests to other userspace entities to perform actions and how
> those actions are performed can be left to userspace policy, not kernel
> policy.  I think this would still satisfy the development use case, the
> enabling path just takes a different route where privileged userspace
> is more intimately involved in the process.  Thoughts?  Thanks,

I understand the desire to use a different interface such as sysfs for
the trusted user-space component. I'm not sure though how using
sriov_driver_override solves the issues we have been discussing. After
SR-IOV is enabled by libvirt, it is still possible for the administrator
(or another trusted daemon racing with libvirt) to open the VFs with
VFIO before libvirt had a chance to open them and send them to QEMU.

Are you okay with that?

Thanks,
Haggai

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/2] VFIO SRIOV support
  2016-07-25  7:53                     ` Haggai Eran
@ 2016-07-25 15:07                       ` Alex Williamson
  2016-07-25 15:34                         ` Ilya Lesokhin
  0 siblings, 1 reply; 20+ messages in thread
From: Alex Williamson @ 2016-07-25 15:07 UTC (permalink / raw)
  To: Haggai Eran
  Cc: Tian, Kevin, Ilya Lesokhin, kvm, linux-pci, bhelgaas,
	Noa Osherovich, Or Gerlitz, Liran Liss

On Mon, 25 Jul 2016 10:53:56 +0300
Haggai Eran <haggaie@mellanox.com> wrote:

> On 7/19/2016 10:43 PM, Alex Williamson wrote:
> > Thinking about this further, it seems that trying to create this IOV
> > enablement interface through a channel which is explicitly designed to
> > interact with an untrusted and potentially malicious user is the wrong
> > approach.  We already have an interface for a trusted entity to enable
> > VFs, it's through pci-sysfs.  Therefore if we were to use something like
> > libvirt to orchestrate the lifecycle of the VFs, I think we remove a
> > lot of the problems.  In this case QEMU would virtualize the SR-IOV
> > capability (maybe this is along the lines of what Kevin was thinking),
> > but that virtualization would take a path out through the QEMU QMP
> > interface to execute the SR-IOV change on the device rather than going
> > through the vfio kernel interface.  A management tool like libvirt
> > would then need to translate that into sysfs operations to create the
> > VFs and do whatever we're going to do with them (device_add them back
> > to the VM, make them available to a peer VM, make them available to
> > the host *gasp*).  VFIO in the kernel would need to add SR-IOV
> > support, but the only automatic SR-IOV path would be to disable IOV
> > when the PF is released, enabling would only occur through sysfs.  We
> > would probably need a new pci-sysfs interface to manage the driver for
> > newly created VFs though to avoid default host drivers
> > (sriov_driver_override?).  In this model QEMU is essentially just
> > making requests to other userspace entities to perform actions and how
> > those actions are performed can be left to userspace policy, not kernel
> > policy.  I think this would still satisfy the development use case, the
> > enabling path just takes a different route where privileged userspace
> > is more intimately involved in the process.  Thoughts?  Thanks,  
> 
> I understand the desire to use a different interface such as sysfs for
> the trusted user-space component. I'm not sure though how using
> sriov_driver_override solves the issues we have been discussing. After
> SR-IOV is enabled by libvirt, it is still possible for the administrator
> (or another trusted daemon racing with libvirt) to open the VFs with
> VFIO before libvirt had a chance to open them and send them to QEMU.
> 
> Are you okay with that?

If a privileged entity like libvirt is creating VFs on behalf of a
user, it's going to be that entity's responsibility to claim ownership
of all those created VFs.  sriov_driver_override is just one suggestion
that might help facilitate that.  Others might be necessary, but
ultimately it's not the kernel's problem to work out which entity gets
to take ownership of those devices, that's a userspace problem, just as
it is already today.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v2 0/2] VFIO SRIOV support
  2016-07-25 15:07                       ` Alex Williamson
@ 2016-07-25 15:34                         ` Ilya Lesokhin
  2016-07-25 15:58                           ` Alex Williamson
  0 siblings, 1 reply; 20+ messages in thread
From: Ilya Lesokhin @ 2016-07-25 15:34 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Tian, Kevin, kvm, linux-pci, bhelgaas, Noa Osherovich,
	Or Gerlitz, Liran Liss, Haggai Eran

Hi Alex,
It seems that I'm missing something because I fail to see how the sysfs interface solves
any of the problems you've pointed out in the current code.

Can you please clarify what you want us to do on the kernel side besides moving to the sysfs interface?
How do we prevent the administrator from unbinding the VFs from vfio and binding them to the default driver?
How does the new model prevent the administrator from assigning the VFs to other VM?

Thanks
Ilya

-----Original Message-----
From: Alex Williamson [mailto:alex.williamson@redhat.com] 
Sent: Monday, July 25, 2016 6:08 PM
To: Haggai Eran <haggaie@mellanox.com>
Cc: Tian, Kevin <kevin.tian@intel.com>; Ilya Lesokhin <ilyal@mellanox.com>; kvm@vger.kernel.org; linux-pci@vger.kernel.org; bhelgaas@google.com; Noa Osherovich <noaos@mellanox.com>; Or Gerlitz <ogerlitz@mellanox.com>; Liran Liss <liranl@mellanox.com>
Subject: Re: [PATCH v2 0/2] VFIO SRIOV support

On Mon, 25 Jul 2016 10:53:56 +0300
Haggai Eran <haggaie@mellanox.com> wrote:

> On 7/19/2016 10:43 PM, Alex Williamson wrote:
> > Thinking about this further, it seems that trying to create this IOV 
> > enablement interface through a channel which is explicitly designed 
> > to interact with an untrusted and potentially malicious user is the 
> > wrong approach.  We already have an interface for a trusted entity 
> > to enable VFs, it's through pci-sysfs.  Therefore if we were to use 
> > something like libvirt to orchestrate the lifecycle of the VFs, I 
> > think we remove a lot of the problems.  In this case QEMU would 
> > virtualize the SR-IOV capability (maybe this is along the lines of 
> > what Kevin was thinking), but that virtualization would take a path 
> > out through the QEMU QMP interface to execute the SR-IOV change on 
> > the device rather than going through the vfio kernel interface.  A 
> > management tool like libvirt would then need to translate that into 
> > sysfs operations to create the VFs and do whatever we're going to do 
> > with them (device_add them back to the VM, make them available to a 
> > peer VM, make them available to the host *gasp*).  VFIO in the 
> > kernel would need to add SR-IOV support, but the only automatic 
> > SR-IOV path would be to disable IOV when the PF is released, 
> > enabling would only occur through sysfs.  We would probably need a 
> > new pci-sysfs interface to manage the driver for newly created VFs 
> > though to avoid default host drivers (sriov_driver_override?).  In 
> > this model QEMU is essentially just making requests to other 
> > userspace entities to perform actions and how those actions are 
> > performed can be left to userspace policy, not kernel policy.  I 
> > think this would still satisfy the development use case, the 
> > enabling path just takes a different route where privileged 
> > userspace is more intimately involved in the process.  Thoughts?  
> > Thanks,
> 
> I understand the desire to use a different interface such as sysfs for 
> the trusted user-space component. I'm not sure though how using 
> sriov_driver_override solves the issues we have been discussing. After 
> SR-IOV is enabled by libvirt, it is still possible for the 
> administrator (or another trusted daemon racing with libvirt) to open 
> the VFs with VFIO before libvirt had a chance to open them and send them to QEMU.
> 
> Are you okay with that?

If a privileged entity like libvirt is creating VFs on behalf of a user, it's going to be that entity's responsibility to claim ownership of all those created VFs.  sriov_driver_override is just one suggestion that might help facilitate that.  Others might be necessary, but ultimately it's not the kernel's problem to work out which entity gets to take ownership of those devices, that's a userspace problem, just as it is already today.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/2] VFIO SRIOV support
  2016-07-25 15:34                         ` Ilya Lesokhin
@ 2016-07-25 15:58                           ` Alex Williamson
  0 siblings, 0 replies; 20+ messages in thread
From: Alex Williamson @ 2016-07-25 15:58 UTC (permalink / raw)
  To: Ilya Lesokhin
  Cc: Tian, Kevin, kvm, linux-pci, bhelgaas, Noa Osherovich,
	Or Gerlitz, Liran Liss, Haggai Eran

On Mon, 25 Jul 2016 15:34:34 +0000
Ilya Lesokhin <ilyal@mellanox.com> wrote:

> Hi Alex,
> It seems that I'm missing something because I fail to see how the sysfs interface solves
> any of the problems you've pointed out in the current code.

If the kernel allows a user who owns a PF to spawn VFs then it becomes
the kernel's problem how to define and enforce the policy around those
devices (ie. who is allowed to use them, how does ownership get
transferred, what do we consider secure vs insecure, etc).  On the
other hand if we do not allow a user that direct path and they need to
interact through a trusted channel to create those VFs, then it becomes
the problem of the entity creating them how to address those policy
questions.  That's a big difference.  In general I think it's a bad
idea to implement policy in the kernel, the kernel should provide the
tools to allow userspace to manage the policy.
 
> Can you please clarify what you want us to do on the kernel side besides moving to the sysfs interface?
> How do we prevent the administrator from unbinding the VFs from vfio and binding them to the default driver?
> How does the new model prevent the administrator from assigning the VFs to other VM?

In both cases the answer is "we don't".  Those are questions that would
need to be addressed if the kernel was implementing the policy.  The
kernel already has an interface for creating SRIOV VFs and it's up to
userspace how to manage that in a secure way.  If someone has a use
case where a user owned PF spawns a host kernel owned VF, that's for
userspace to determine whether it's valid.  sriov_driver_override is
only a suggestion towards facilitating user managed VFs, allowing the
creator of the VFs to more concisely control the driver binding for
those VFs.  Thanks,

Alex

> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com] 
> Sent: Monday, July 25, 2016 6:08 PM
> To: Haggai Eran <haggaie@mellanox.com>
> Cc: Tian, Kevin <kevin.tian@intel.com>; Ilya Lesokhin <ilyal@mellanox.com>; kvm@vger.kernel.org; linux-pci@vger.kernel.org; bhelgaas@google.com; Noa Osherovich <noaos@mellanox.com>; Or Gerlitz <ogerlitz@mellanox.com>; Liran Liss <liranl@mellanox.com>
> Subject: Re: [PATCH v2 0/2] VFIO SRIOV support
> 
> On Mon, 25 Jul 2016 10:53:56 +0300
> Haggai Eran <haggaie@mellanox.com> wrote:
> 
> > On 7/19/2016 10:43 PM, Alex Williamson wrote:  
> > > Thinking about this further, it seems that trying to create this IOV 
> > > enablement interface through a channel which is explicitly designed 
> > > to interact with an untrusted and potentially malicious user is the 
> > > wrong approach.  We already have an interface for a trusted entity 
> > > to enable VFs, it's through pci-sysfs.  Therefore if we were to use 
> > > something like libvirt to orchestrate the lifecycle of the VFs, I 
> > > think we remove a lot of the problems.  In this case QEMU would 
> > > virtualize the SR-IOV capability (maybe this is along the lines of 
> > > what Kevin was thinking), but that virtualization would take a path 
> > > out through the QEMU QMP interface to execute the SR-IOV change on 
> > > the device rather than going through the vfio kernel interface.  A 
> > > management tool like libvirt would then need to translate that into 
> > > sysfs operations to create the VFs and do whatever we're going to do 
> > > with them (device_add them back to the VM, make them available to a 
> > > peer VM, make them available to the host *gasp*).  VFIO in the 
> > > kernel would need to add SR-IOV support, but the only automatic 
> > > SR-IOV path would be to disable IOV when the PF is released, 
> > > enabling would only occur through sysfs.  We would probably need a 
> > > new pci-sysfs interface to manage the driver for newly created VFs 
> > > though to avoid default host drivers (sriov_driver_override?).  In 
> > > this model QEMU is essentially just making requests to other 
> > > userspace entities to perform actions and how those actions are 
> > > performed can be left to userspace policy, not kernel policy.  I 
> > > think this would still satisfy the development use case, the 
> > > enabling path just takes a different route where privileged 
> > > userspace is more intimately involved in the process.  Thoughts?  
> > > Thanks,  
> > 
> > I understand the desire to use a different interface such as sysfs for 
> > the trusted user-space component. I'm not sure though how using 
> > sriov_driver_override solves the issues we have been discussing. After 
> > SR-IOV is enabled by libvirt, it is still possible for the 
> > administrator (or another trusted daemon racing with libvirt) to open 
> > the VFs with VFIO before libvirt had a chance to open them and send them to QEMU.
> > 
> > Are you okay with that?  
> 
> If a privileged entity like libvirt is creating VFs on behalf of a user, it's going to be that entity's responsibility to claim ownership of all those created VFs.  sriov_driver_override is just one suggestion that might help facilitate that.  Others might be necessary, but ultimately it's not the kernel's problem to work out which entity gets to take ownership of those devices, that's a userspace problem, just as it is already today.  Thanks,
> 
> Alex


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-07-26  4:07 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-19 12:16 [PATCH v2 0/2] VFIO SRIOV support Ilya Lesokhin
2016-06-19 12:16 ` [PATCH v2 1/2] PCI: Extend PCI IOV API Ilya Lesokhin
2016-06-19 14:10   ` kbuild test robot
2016-06-19 12:16 ` [PATCH v2 2/2] VFIO: Add support for SR-IOV extended capablity Ilya Lesokhin
2016-06-19 23:07   ` kbuild test robot
2016-06-20 17:37 ` [PATCH v2 0/2] VFIO SRIOV support Alex Williamson
2016-06-21  7:19   ` Ilya Lesokhin
2016-06-21 15:45     ` Alex Williamson
2016-07-14 14:53       ` Ilya Lesokhin
2016-07-14 17:03         ` Alex Williamson
2016-07-17 10:05           ` Haggai Eran
2016-07-18 21:34             ` Alex Williamson
2016-07-19  7:06               ` Tian, Kevin
2016-07-19 15:10                 ` Alex Williamson
2016-07-19 19:43                   ` Alex Williamson
2016-07-21  5:51                     ` Tian, Kevin
2016-07-25  7:53                     ` Haggai Eran
2016-07-25 15:07                       ` Alex Williamson
2016-07-25 15:34                         ` Ilya Lesokhin
2016-07-25 15:58                           ` Alex Williamson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.