All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/2] VFIO SRIOV support
@ 2015-12-22 13:42 Ilya Lesokhin
  2015-12-22 13:42 ` [RFC 1/2] PCI: Expose iov_set_numvfs and iov_resource_size for modules Ilya Lesokhin
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Ilya Lesokhin @ 2015-12-22 13:42 UTC (permalink / raw)
  To: kvm, linux-pci
  Cc: bhelgaas, alex.williamson, noaos, haggaie, ogerlitz, liranl, ilyal

Today the QEMU hypervisor allows assigning a physical device to a VM,
facilitating driver development. However, it does not support enabling
SR-IOV by the VM kernel driver. Our goal is to implement such support,
allowing developers working on SR-IOV physical function drivers to work
inside VMs as well.

This patch series implements the kernel side of our solution.  It extends
the VFIO driver to support the PCIE SRIOV extended capability with
following features:
1. The ability to probe SRIOV BAR sizes.
2. The ability to enable and disable sriov.

This patch series is going to be used by QEMU to expose sriov capabilities
to VM. We already have an early prototype based on Knut Omang's patches for
SRIOV[1]. 

Open issues:
1. Binding the new VFs to VFIO driver.
Once the VM enables sriov it expects the new VFs to appear inside the VM.
To this end we need to bind the new vfs to the VFIO driver and have QEMU
grab them. We are currently achieve this goal using:
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
but we are not happy about this solution as a system might have another
device with the same id that is unrelated to our VM.
Other solution we've considered are:
 a. Having user space unbind and then bind the VFs to VFIO.
     Typically resulting in an unnecessary probing of the device.
 b. Adding a driver argument to pci_enable_sriov(...) and have
    vfio call pci_enable_sriov with the vfio driver as argument.
    This solution avoids the unnecessary but is more intrusive.

2. How to tell if it is safe to disable SRIOV?
In the current implementation, a userspace can enable sriov, grab one of
the VFs and then call disable sriov without releasing the device.  This
will result in a deadlock where the user process is stuck inside disable
sriov waiting for itself to release the device. Killing the process leaves
it in a zombie state.
We also get a strange warning saying:
[  181.668492] WARNING: CPU: 22 PID: 3684 at kernel/sched/core.c:7497 __might_sleep+0x77/0x80() 
[  181.668502] do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810aa193>] prepare_to_wait_event+0x63/0xf0

3. How to expose the Supported Page Sizes and System Page Size registers in
the SRIOV capability? 
Presently the hypervisor initializes Supported Page Sizes once and assumes
it doesn't change therefore we cannot allow user space to change this
register at will. The first solution that comes to mind is to expose a
device that only supports the page size selected by the hypervisor.
Unfourtently, Per SR-IOV spec section 3.3.12, PFs are required to support
4-KB, 8-KB, 64-KB, 256-KB, 1-MB, and 4-MB page sizes. We currently map both
registers as virtualized and read only and leave user space to worry about
this problem.

4. Other SRIOV capabilities.
Do we want to hide capabilities we do not support in the SR-IOV
Capabilities register? or leave it to the userspace application?

[1] https://github.com/knuto/qemu/tree/sriov_patches_v6

Ilya Lesokhin (2):
  PCI: Expose iov_set_numvfs and iov_resource_size for modules.
  VFIO: Add support for SRIOV extended capablity

 drivers/pci/iov.c                  |   4 +-
 drivers/vfio/pci/vfio_pci_config.c | 169 +++++++++++++++++++++++++++++++++----
 include/linux/pci.h                |   4 +
 3 files changed, 159 insertions(+), 18 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC 1/2] PCI: Expose iov_set_numvfs and iov_resource_size for modules.
  2015-12-22 13:42 [RFC 0/2] VFIO SRIOV support Ilya Lesokhin
@ 2015-12-22 13:42 ` Ilya Lesokhin
  2015-12-22 13:42 ` [RFC 2/2] VFIO: Add support for SRIOV extended capablity Ilya Lesokhin
  2015-12-22 15:35 ` [RFC 0/2] VFIO SRIOV support Alex Williamson
  2 siblings, 0 replies; 10+ messages in thread
From: Ilya Lesokhin @ 2015-12-22 13:42 UTC (permalink / raw)
  To: kvm, linux-pci
  Cc: bhelgaas, alex.williamson, noaos, haggaie, ogerlitz, liranl, ilyal

Expose iov_set_numvfs and iov_resource_size to make them available
for VFIO-PCI sriov support.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
---
 drivers/pci/iov.c   | 4 +++-
 include/linux/pci.h | 4 ++++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index ee0ebff..f296bd3 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -41,7 +41,7 @@ int pci_iov_virtfn_devfn(struct pci_dev *dev, int vf_id)
  *
  * Update iov->offset and iov->stride when NumVFs is written.
  */
-static inline void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn)
+inline void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn)
 {
 	struct pci_sriov *iov = dev->sriov;
 
@@ -49,6 +49,7 @@ static inline void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn)
 	pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_OFFSET, &iov->offset);
 	pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_STRIDE, &iov->stride);
 }
+EXPORT_SYMBOL(pci_iov_set_numvfs);
 
 /*
  * The PF consumes one bus number.  NumVFs, First VF Offset, and VF Stride
@@ -107,6 +108,7 @@ resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
 
 	return dev->sriov->barsz[resno - PCI_IOV_RESOURCES];
 }
+EXPORT_SYMBOL(pci_iov_resource_size);
 
 static int virtfn_add(struct pci_dev *dev, int id, int reset)
 {
diff --git a/include/linux/pci.h b/include/linux/pci.h
index e90eb22..1039e18 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1724,6 +1724,8 @@ int pci_vfs_assigned(struct pci_dev *dev);
 int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
 int pci_sriov_get_totalvfs(struct pci_dev *dev);
 resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno);
+
+void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn);
 #else
 static inline int pci_iov_virtfn_bus(struct pci_dev *dev, int id)
 {
@@ -1745,6 +1747,8 @@ static inline int pci_sriov_get_totalvfs(struct pci_dev *dev)
 { return 0; }
 static inline resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
 { return 0; }
+
+void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn) { }
 #endif
 
 #if defined(CONFIG_HOTPLUG_PCI) || defined(CONFIG_HOTPLUG_PCI_MODULE)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC 2/2] VFIO: Add support for SRIOV extended capablity
  2015-12-22 13:42 [RFC 0/2] VFIO SRIOV support Ilya Lesokhin
  2015-12-22 13:42 ` [RFC 1/2] PCI: Expose iov_set_numvfs and iov_resource_size for modules Ilya Lesokhin
@ 2015-12-22 13:42 ` Ilya Lesokhin
  2015-12-22 15:35 ` [RFC 0/2] VFIO SRIOV support Alex Williamson
  2 siblings, 0 replies; 10+ messages in thread
From: Ilya Lesokhin @ 2015-12-22 13:42 UTC (permalink / raw)
  To: kvm, linux-pci
  Cc: bhelgaas, alex.williamson, noaos, haggaie, ogerlitz, liranl, ilyal

Add support for PCIE SRIOV extended capablity with following features:
1. The ability to probe SRIOV BAR sizes.
2. The ability to enable and disable sriov.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
---
 drivers/vfio/pci/vfio_pci_config.c | 169 +++++++++++++++++++++++++++++++++----
 1 file changed, 152 insertions(+), 17 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index ff75ca3..04e364f 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -420,6 +420,35 @@ static __le32 vfio_generate_bar_flags(struct pci_dev *pdev, int bar)
 	return cpu_to_le32(val);
 }
 
+static void vfio_sriov_bar_fixup(struct vfio_pci_device *vdev,
+				 int sriov_cap_start)
+{
+	struct pci_dev *pdev = vdev->pdev;
+	int i;
+	__le32 *bar;
+	u64 mask;
+
+	bar = (__le32 *)&vdev->vconfig[sriov_cap_start + PCI_SRIOV_BAR];
+
+	for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++, bar++) {
+		if (!pci_resource_start(pdev, i)) {
+			*bar = 0; /* Unmapped by host = unimplemented to user */
+			continue;
+		}
+
+		mask = ~(pci_iov_resource_size(pdev, i) - 1);
+
+		*bar &= cpu_to_le32((u32)mask);
+		*bar |= vfio_generate_bar_flags(pdev, i);
+
+		if (*bar & cpu_to_le32(PCI_BASE_ADDRESS_MEM_TYPE_64)) {
+			bar++;
+			*bar &= cpu_to_le32((u32)(mask >> 32));
+			i++;
+		}
+	}
+}
+
 /*
  * Pretend we're hardware and tweak the values of the *virtual* PCI BARs
  * to reflect the hardware capabilities.  This implements BAR sizing.
@@ -782,6 +811,124 @@ static int __init init_pci_ext_cap_pwr_perm(struct perm_bits *perm)
 	return 0;
 }
 
+static int __init init_pci_ext_cap_sriov_perm(struct perm_bits *perm)
+{
+	int i;
+
+	if (alloc_perm_bits(perm, pci_ext_cap_length[PCI_EXT_CAP_ID_SRIOV]))
+		return -ENOMEM;
+
+	/*
+	 * Virtualize the first dword of all express capabilities
+	 * because it includes the next pointer.  This lets us later
+	 * remove capabilities from the chain if we need to.
+	 */
+	p_setd(perm, 0, ALL_VIRT, NO_WRITE);
+
+	/* VF Enable - Virtualized and writable
+	 * Memory Space Enable - Non-virtualized and writable
+	 */
+	p_setw(perm, PCI_SRIOV_CTRL, PCI_SRIOV_CTRL_VFE,
+	       PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
+
+	p_setw(perm, PCI_SRIOV_NUM_VF, (u16)ALL_VIRT, (u16)ALL_WRITE);
+	p_setw(perm, PCI_SRIOV_SUP_PGSIZE, (u16)ALL_VIRT, 0);
+
+	/* We cannot let user space application change the page size
+	 * so we mark it as read only and trust the user application
+	 * (e.g. qemu) to virtualize this correctly for the guest
+	 */
+	p_setw(perm, PCI_SRIOV_SYS_PGSIZE, (u16)ALL_VIRT, 0);
+
+	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
+		p_setd(perm, PCI_SRIOV_BAR + 4 * i, ALL_VIRT, ALL_WRITE);
+
+	return 0;
+}
+
+static int vfio_find_cap_start(struct vfio_pci_device *vdev, int pos)
+{
+	u8 cap;
+	int base = (pos >= PCI_CFG_SPACE_SIZE) ? PCI_CFG_SPACE_SIZE :
+						 PCI_STD_HEADER_SIZEOF;
+	cap = vdev->pci_config_map[pos];
+
+	if (cap == PCI_CAP_ID_BASIC)
+		return 0;
+
+	/* XXX Can we have to abutting capabilities of the same type? */
+	while (pos - 1 >= base && vdev->pci_config_map[pos - 1] == cap)
+		pos--;
+
+	return pos;
+}
+
+static int vfio_sriov_cap_config_read(struct vfio_pci_device *vdev, int pos,
+				      int count, struct perm_bits *perm,
+				       int offset, __le32 *val)
+{
+	int cap_start = vfio_find_cap_start(vdev, pos);
+
+	vfio_sriov_bar_fixup(vdev, cap_start);
+	return vfio_default_config_read(vdev, pos, count, perm, offset, val);
+}
+
+static int vfio_sriov_cap_config_write(struct vfio_pci_device *vdev, int pos,
+				       int count, struct perm_bits *perm,
+				       int offset, __le32 val)
+{
+	int ret;
+	int cap_start = vfio_find_cap_start(vdev, pos);
+	u16 sriov_ctrl = *(u16 *)(vdev->vconfig + cap_start + PCI_SRIOV_CTRL);
+	bool cur_vf_enabled = sriov_ctrl & PCI_SRIOV_CTRL_VFE;
+	bool vf_enabled;
+
+	switch (offset) {
+	case  PCI_SRIOV_NUM_VF:
+	/* Per SR-IOV spec sec 3.3.10 and 3.3.11, First VF Offset
+	 * and VF Stride may change when NumVFs changes.
+	 *
+	 * Therefore we should pass valid writes to the hardware.
+	 *
+	 * Per SR-IOV spec sec 3.3.7
+	 * The results are undefined if NumVFs is set to a value greater
+	 * than TotalVFs.
+	 * NumVFs may only be written while VF Enable is Clear.
+	 * If NumVFs is written when VF Enable is Set, the results
+	 * are undefined.
+
+	 * Avoid passing such writes to the Hardware just in case.
+	 */
+		if (cur_vf_enabled ||
+		    val > pci_sriov_get_totalvfs(vdev->pdev))
+			return count;
+
+		pci_iov_set_numvfs(vdev->pdev, val);
+		break;
+
+	case PCI_SRIOV_CTRL:
+		vf_enabled = val & PCI_SRIOV_CTRL_VFE;
+
+		if (!cur_vf_enabled && vf_enabled) {
+			u16 num_vfs = *(u16 *)(vdev->vconfig +
+					cap_start +
+					PCI_SRIOV_NUM_VF);
+			ret = pci_enable_sriov(vdev->pdev, num_vfs);
+			if (ret)
+				return count;
+		} else if (cur_vf_enabled && !vf_enabled) {
+			pci_disable_sriov(vdev->pdev);
+		}
+		break;
+
+	default:
+		break;
+	}
+
+	return vfio_default_config_write(vdev, pos, count, perm,
+					 offset, val);
+}
+
 /*
  * Initialize the shared permission tables
  */
@@ -796,6 +943,7 @@ void vfio_pci_uninit_perm_bits(void)
 
 	free_perm_bits(&ecap_perms[PCI_EXT_CAP_ID_ERR]);
 	free_perm_bits(&ecap_perms[PCI_EXT_CAP_ID_PWR]);
+	free_perm_bits(&ecap_perms[PCI_EXT_CAP_ID_SRIOV]);
 }
 
 int __init vfio_pci_init_perm_bits(void)
@@ -818,29 +966,16 @@ int __init vfio_pci_init_perm_bits(void)
 	ret |= init_pci_ext_cap_pwr_perm(&ecap_perms[PCI_EXT_CAP_ID_PWR]);
 	ecap_perms[PCI_EXT_CAP_ID_VNDR].writefn = vfio_raw_config_write;
 
+	ret |= init_pci_ext_cap_sriov_perm(&ecap_perms[PCI_EXT_CAP_ID_SRIOV]);
+	ecap_perms[PCI_EXT_CAP_ID_SRIOV].readfn = vfio_sriov_cap_config_read;
+	ecap_perms[PCI_EXT_CAP_ID_SRIOV].writefn = vfio_sriov_cap_config_write;
+
 	if (ret)
 		vfio_pci_uninit_perm_bits();
 
 	return ret;
 }
 
-static int vfio_find_cap_start(struct vfio_pci_device *vdev, int pos)
-{
-	u8 cap;
-	int base = (pos >= PCI_CFG_SPACE_SIZE) ? PCI_CFG_SPACE_SIZE :
-						 PCI_STD_HEADER_SIZEOF;
-	cap = vdev->pci_config_map[pos];
-
-	if (cap == PCI_CAP_ID_BASIC)
-		return 0;
-
-	/* XXX Can we have to abutting capabilities of the same type? */
-	while (pos - 1 >= base && vdev->pci_config_map[pos - 1] == cap)
-		pos--;
-
-	return pos;
-}
-
 static int vfio_msi_config_read(struct vfio_pci_device *vdev, int pos,
 				int count, struct perm_bits *perm,
 				int offset, __le32 *val)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC 0/2] VFIO SRIOV support
  2015-12-22 13:42 [RFC 0/2] VFIO SRIOV support Ilya Lesokhin
  2015-12-22 13:42 ` [RFC 1/2] PCI: Expose iov_set_numvfs and iov_resource_size for modules Ilya Lesokhin
  2015-12-22 13:42 ` [RFC 2/2] VFIO: Add support for SRIOV extended capablity Ilya Lesokhin
@ 2015-12-22 15:35 ` Alex Williamson
  2015-12-23  7:43   ` Ilya Lesokhin
  2 siblings, 1 reply; 10+ messages in thread
From: Alex Williamson @ 2015-12-22 15:35 UTC (permalink / raw)
  To: Ilya Lesokhin, kvm, linux-pci; +Cc: bhelgaas, noaos, haggaie, ogerlitz, liranl

On Tue, 2015-12-22 at 15:42 +0200, Ilya Lesokhin wrote:
> Today the QEMU hypervisor allows assigning a physical device to a VM,
> facilitating driver development. However, it does not support
> enabling
> SR-IOV by the VM kernel driver. Our goal is to implement such
> support,
> allowing developers working on SR-IOV physical function drivers to
> work
> inside VMs as well.
> 
> This patch series implements the kernel side of our solution.  It
> extends
> the VFIO driver to support the PCIE SRIOV extended capability with
> following features:
> 1. The ability to probe SRIOV BAR sizes.
> 2. The ability to enable and disable sriov.
> 
> This patch series is going to be used by QEMU to expose sriov
> capabilities
> to VM. We already have an early prototype based on Knut Omang's
> patches for
> SRIOV[1]. 
> 
> Open issues:
> 1. Binding the new VFs to VFIO driver.
> Once the VM enables sriov it expects the new VFs to appear inside the
> VM.
> To this end we need to bind the new vfs to the VFIO driver and have
> QEMU
> grab them. We are currently achieve this goal using:
> echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
> but we are not happy about this solution as a system might have
> another
> device with the same id that is unrelated to our VM.
> Other solution we've considered are:
>  a. Having user space unbind and then bind the VFs to VFIO.
>      Typically resulting in an unnecessary probing of the device.
>  b. Adding a driver argument to pci_enable_sriov(...) and have
>     vfio call pci_enable_sriov with the vfio driver as argument.
>     This solution avoids the unnecessary but is more intrusive.

You could use driver_override for this, but the open issue you haven't
listed is the ownership problem, VFs will be in separate iommu groups
and therefore create separate vfio groups.  How do those get associated
with the user so that we don't have one user controlling the VFs for
another user, or worse for the host kernel.  Whatever solution you come
up with needs to protect the host kernel, first and foremost.  It's not
sufficient to rely on userspace to grab the VFs and sequester them for
use only by that user, the host kernel needs to provide that security
automatically.  Thanks,

Alex

> 2. How to tell if it is safe to disable SRIOV?
> In the current implementation, a userspace can enable sriov, grab one
> of
> the VFs and then call disable sriov without releasing the
> device.  This
> will result in a deadlock where the user process is stuck inside
> disable
> sriov waiting for itself to release the device. Killing the process
> leaves
> it in a zombie state.
> We also get a strange warning saying:
> [  181.668492] WARNING: CPU: 22 PID: 3684 at kernel/sched/core.c:7497
> __might_sleep+0x77/0x80() 
> [  181.668502] do not call blocking ops when !TASK_RUNNING; state=1
> set at [<ffffffff810aa193>] prepare_to_wait_event+0x63/0xf0
> 
> 3. How to expose the Supported Page Sizes and System Page Size
> registers in
> the SRIOV capability? 
> Presently the hypervisor initializes Supported Page Sizes once and
> assumes
> it doesn't change therefore we cannot allow user space to change this
> register at will. The first solution that comes to mind is to expose
> a
> device that only supports the page size selected by the hypervisor.
> Unfourtently, Per SR-IOV spec section 3.3.12, PFs are required to
> support
> 4-KB, 8-KB, 64-KB, 256-KB, 1-MB, and 4-MB page sizes. We currently
> map both
> registers as virtualized and read only and leave user space to worry
> about
> this problem.
> 
> 4. Other SRIOV capabilities.
> Do we want to hide capabilities we do not support in the SR-IOV
> Capabilities register? or leave it to the userspace application?
> 
> [1] https://github.com/knuto/qemu/tree/sriov_patches_v6
> 
> Ilya Lesokhin (2):
>   PCI: Expose iov_set_numvfs and iov_resource_size for modules.
>   VFIO: Add support for SRIOV extended capablity
> 
>  drivers/pci/iov.c                  |   4 +-
>  drivers/vfio/pci/vfio_pci_config.c | 169
> +++++++++++++++++++++++++++++++++----
>  include/linux/pci.h                |   4 +
>  3 files changed, 159 insertions(+), 18 deletions(-)
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [RFC 0/2] VFIO SRIOV support
  2015-12-22 15:35 ` [RFC 0/2] VFIO SRIOV support Alex Williamson
@ 2015-12-23  7:43   ` Ilya Lesokhin
  2015-12-23 16:28     ` Alex Williamson
  0 siblings, 1 reply; 10+ messages in thread
From: Ilya Lesokhin @ 2015-12-23  7:43 UTC (permalink / raw)
  To: Alex Williamson, kvm, linux-pci
  Cc: bhelgaas, Noa Osherovich, Haggai Eran, Or Gerlitz, Liran Liss

Hi Alex,
Regarding driver_override, as far as I know you can only use it on devices that were already discovered. Since the devices do not exist before the call to pci_enable_sriov(...)
and are already probed after the call  it wouldn't really help us. I would have to unbind them from their default driver and bind them to VFIO like solution a in my original mail.

You are right about the ownership problem and we would like to receive input regarding what is the correct way of solving this. 
But in the meantime I think our solution is quite useful even though if it requires root privileges. We hacked libvirt so that it would run qemu as root and without device cgroup.

In any case, don't you think that assigning those devices to VFIO should be safe? Does the VFIO driver makes any unsafe assumptions on the VF's that might allow a guest to crash the hypervisor?

I am somewhat concerned that the VM  could trigger some backdoor reset while the hypervisor is running pci_enable_sriov(...). But I'm not really sure how to solve it.
I guess you have to either stop the guest entirely to enable sriov or make it privileged.

Regarding having the PF controlled by one user while the other VFs are controlled by other user, I actually think it might be an interesting use case.

Thanks,
Ilya


-----Original Message-----
From: Alex Williamson [mailto:alex.williamson@redhat.com] 
Sent: Tuesday, December 22, 2015 5:36 PM
To: Ilya Lesokhin <ilyal@mellanox.com>; kvm@vger.kernel.org; linux-pci@vger.kernel.org
Cc: bhelgaas@google.com; Noa Osherovich <noaos@mellanox.com>; Haggai Eran <haggaie@mellanox.com>; Or Gerlitz <ogerlitz@mellanox.com>; Liran Liss <liranl@mellanox.com>
Subject: Re: [RFC 0/2] VFIO SRIOV support

On Tue, 2015-12-22 at 15:42 +0200, Ilya Lesokhin wrote:
> Today the QEMU hypervisor allows assigning a physical device to a VM, 
> facilitating driver development. However, it does not support enabling 
> SR-IOV by the VM kernel driver. Our goal is to implement such support, 
> allowing developers working on SR-IOV physical function drivers to 
> work inside VMs as well.
> 
> This patch series implements the kernel side of our solution.  It 
> extends the VFIO driver to support the PCIE SRIOV extended capability 
> with following features:
> 1. The ability to probe SRIOV BAR sizes.
> 2. The ability to enable and disable sriov.
> 
> This patch series is going to be used by QEMU to expose sriov 
> capabilities to VM. We already have an early prototype based on Knut 
> Omang's patches for SRIOV[1].
> 
> Open issues:
> 1. Binding the new VFs to VFIO driver.
> Once the VM enables sriov it expects the new VFs to appear inside the 
> VM.
> To this end we need to bind the new vfs to the VFIO driver and have 
> QEMU grab them. We are currently achieve this goal using:
> echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
> but we are not happy about this solution as a system might have 
> another device with the same id that is unrelated to our VM.
> Other solution we've considered are:
>  a. Having user space unbind and then bind the VFs to VFIO.
>      Typically resulting in an unnecessary probing of the device.
>  b. Adding a driver argument to pci_enable_sriov(...) and have
>     vfio call pci_enable_sriov with the vfio driver as argument.
>     This solution avoids the unnecessary but is more intrusive.

You could use driver_override for this, but the open issue you haven't listed is the ownership problem, VFs will be in separate iommu groups and therefore create separate vfio groups.  How do those get associated with the user so that we don't have one user controlling the VFs for another user, or worse for the host kernel.  Whatever solution you come up with needs to protect the host kernel, first and foremost.  It's not sufficient to rely on userspace to grab the VFs and sequester them for use only by that user, the host kernel needs to provide that security automatically.  Thanks,

Alex

> 2. How to tell if it is safe to disable SRIOV?
> In the current implementation, a userspace can enable sriov, grab one 
> of the VFs and then call disable sriov without releasing the device.  
> This will result in a deadlock where the user process is stuck inside 
> disable sriov waiting for itself to release the device. Killing the 
> process leaves it in a zombie state.
> We also get a strange warning saying:
> [  181.668492] WARNING: CPU: 22 PID: 3684 at kernel/sched/core.c:7497
> __might_sleep+0x77/0x80()
> [  181.668502] do not call blocking ops when !TASK_RUNNING; state=1 
> set at [<ffffffff810aa193>] prepare_to_wait_event+0x63/0xf0
> 
> 3. How to expose the Supported Page Sizes and System Page Size 
> registers in the SRIOV capability?
> Presently the hypervisor initializes Supported Page Sizes once and 
> assumes it doesn't change therefore we cannot allow user space to 
> change this register at will. The first solution that comes to mind is 
> to expose a device that only supports the page size selected by the 
> hypervisor.
> Unfourtently, Per SR-IOV spec section 3.3.12, PFs are required to 
> support 4-KB, 8-KB, 64-KB, 256-KB, 1-MB, and 4-MB page sizes. We 
> currently map both registers as virtualized and read only and leave 
> user space to worry about this problem.
> 
> 4. Other SRIOV capabilities.
> Do we want to hide capabilities we do not support in the SR-IOV 
> Capabilities register? or leave it to the userspace application?
> 
> [1] https://github.com/knuto/qemu/tree/sriov_patches_v6
> 
> Ilya Lesokhin (2):
>   PCI: Expose iov_set_numvfs and iov_resource_size for modules.
>   VFIO: Add support for SRIOV extended capablity
> 
>  drivers/pci/iov.c                  |   4 +-
>  drivers/vfio/pci/vfio_pci_config.c | 169
> +++++++++++++++++++++++++++++++++----
>  include/linux/pci.h                |   4 +
>  3 files changed, 159 insertions(+), 18 deletions(-)
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/2] VFIO SRIOV support
  2015-12-23  7:43   ` Ilya Lesokhin
@ 2015-12-23 16:28     ` Alex Williamson
  2015-12-24  7:22         ` Ilya Lesokhin
  0 siblings, 1 reply; 10+ messages in thread
From: Alex Williamson @ 2015-12-23 16:28 UTC (permalink / raw)
  To: Ilya Lesokhin, kvm, linux-pci
  Cc: bhelgaas, Noa Osherovich, Haggai Eran, Or Gerlitz, Liran Liss

On Wed, 2015-12-23 at 07:43 +0000, Ilya Lesokhin wrote:
> Hi Alex,
> Regarding driver_override, as far as I know you can only use it on
> devices that were already discovered. Since the devices do not exist
> before the call to pci_enable_sriov(...)
> and are already probed after the call  it wouldn't really help us. I
> would have to unbind them from their default driver and bind them to
> VFIO like solution a in my original mail.

If you allow them to be bound to their default driver, then you've
already created the scenario of a user own PF creating host own VFs,
which I think is unacceptable.  The driver_override can be set before
drivers are probed, the fact that pci_enable_sriov() doesn't enable a
hook for that is something that could be fixed.

> You are right about the ownership problem and we would like to
> receive input regarding what is the correct way of solving this. 
> But in the meantime I think our solution is quite useful even though
> if it requires root privileges. We hacked libvirt so that it would
> run qemu as root and without device cgroup.
> 
> In any case, don't you think that assigning those devices to VFIO
> should be safe? Does the VFIO driver makes any unsafe assumptions on
> the VF's that might allow a guest to crash the hypervisor?
> 
> I am somewhat concerned that the VM  could trigger some backdoor
> reset while the hypervisor is running pci_enable_sriov(...). But I'm
> not really sure how to solve it.
> I guess you have to either stop the guest entirely to enable sriov or
> make it privileged.
> 
> Regarding having the PF controlled by one user while the other VFs
> are controlled by other user, I actually think it might be an
> interesting use case.

It may be, but it needs to be an opt-in, not a security accident.  The
interface between a PF and a VF is essential device specific and we
don't know exactly how isolated each VF is from the PF.  In the typical
scenario of the PF being owned by the host, we have a certain degree of
trust in the host, it's running the VM after all, if it wanted to
compromise it, it could.  We have no implicit reason to trust a PF
running in a guest though.  Can the snoop VF traffic, can they generate
DMA outside of the container of the PF using the VF?  We can't be sure.
 So unless you can make the default scenario be that VFs created by a
user own PF are only available for use by that user, without relying on
userspace to intervene, it seems like any potential usefulness is
trumped by a giant security issue.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [RFC 0/2] VFIO SRIOV support
  2015-12-23 16:28     ` Alex Williamson
@ 2015-12-24  7:22         ` Ilya Lesokhin
  0 siblings, 0 replies; 10+ messages in thread
From: Ilya Lesokhin @ 2015-12-24  7:22 UTC (permalink / raw)
  To: Alex Williamson, kvm, linux-pci
  Cc: bhelgaas, Noa Osherovich, Haggai Eran, Or Gerlitz, Liran Liss

PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBBbGV4IFdpbGxpYW1zb24gW21h
aWx0bzphbGV4LndpbGxpYW1zb25AcmVkaGF0LmNvbV0NCj4gU2VudDogV2VkbmVzZGF5LCBEZWNl
bWJlciAyMywgMjAxNSA2OjI4IFBNDQo+IFRvOiBJbHlhIExlc29raGluIDxpbHlhbEBtZWxsYW5v
eC5jb20+OyBrdm1Admdlci5rZXJuZWwub3JnOyBsaW51eC0NCj4gcGNpQHZnZXIua2VybmVsLm9y
Zw0KPiBDYzogYmhlbGdhYXNAZ29vZ2xlLmNvbTsgTm9hIE9zaGVyb3ZpY2ggPG5vYW9zQG1lbGxh
bm94LmNvbT47IEhhZ2dhaQ0KPiBFcmFuIDxoYWdnYWllQG1lbGxhbm94LmNvbT47IE9yIEdlcmxp
dHogPG9nZXJsaXR6QG1lbGxhbm94LmNvbT47IExpcmFuDQo+IExpc3MgPGxpcmFubEBtZWxsYW5v
eC5jb20+DQo+IFN1YmplY3Q6IFJlOiBbUkZDIDAvMl0gVkZJTyBTUklPViBzdXBwb3J0DQo+IA0K
PiBPbiBXZWQsIDIwMTUtMTItMjMgYXQgMDc6NDMgKzAwMDAsIElseWEgTGVzb2toaW4gd3JvdGU6
DQo+ID4gSGkgQWxleCwNCj4gPiBSZWdhcmRpbmcgZHJpdmVyX292ZXJyaWRlLCBhcyBmYXIgYXMg
SSBrbm93IHlvdSBjYW4gb25seSB1c2UgaXQgb24NCj4gPiBkZXZpY2VzIHRoYXQgd2VyZSBhbHJl
YWR5IGRpc2NvdmVyZWQuIFNpbmNlIHRoZSBkZXZpY2VzIGRvIG5vdCBleGlzdA0KPiA+IGJlZm9y
ZSB0aGUgY2FsbCB0byBwY2lfZW5hYmxlX3NyaW92KC4uLikgYW5kIGFyZSBhbHJlYWR5IHByb2Jl
ZCBhZnRlcg0KPiA+IHRoZSBjYWxswqDCoGl0IHdvdWxkbid0IHJlYWxseSBoZWxwIHVzLiBJIHdv
dWxkIGhhdmUgdG8gdW5iaW5kIHRoZW0gZnJvbQ0KPiA+IHRoZWlyIGRlZmF1bHQgZHJpdmVyIGFu
ZCBiaW5kIHRoZW0gdG8gVkZJTyBsaWtlIHNvbHV0aW9uIGEgaW4gbXkNCj4gPiBvcmlnaW5hbCBt
YWlsLg0KPiANCj4gSWYgeW91IGFsbG93IHRoZW0gdG8gYmUgYm91bmQgdG8gdGhlaXIgZGVmYXVs
dCBkcml2ZXIsIHRoZW4geW91J3ZlIGFscmVhZHkNCj4gY3JlYXRlZCB0aGUgc2NlbmFyaW8gb2Yg
YSB1c2VyIG93biBQRiBjcmVhdGluZyBob3N0IG93biBWRnMsIHdoaWNoIEkgdGhpbmsgaXMNCj4g
dW5hY2NlcHRhYmxlLiDCoFRoZSBkcml2ZXJfb3ZlcnJpZGUgY2FuIGJlIHNldCBiZWZvcmUgZHJp
dmVycyBhcmUgcHJvYmVkLCB0aGUNCj4gZmFjdCB0aGF0IHBjaV9lbmFibGVfc3Jpb3YoKSBkb2Vz
bid0IGVuYWJsZSBhIGhvb2sgZm9yIHRoYXQgaXMgc29tZXRoaW5nIHRoYXQNCj4gY291bGQgYmUg
Zml4ZWQuDQoNClRoYXTigJlzIGVzc2VudGlhbGx5IHRoZSBzYW1lIGFzIHNvbHV0aW9uIGIgaW4g
b3JpZ2luYWwgbWFpbCB3aGljaCBJIHdhcyBob3BpbmcgdG8gYXZvaWQuDQoNCj4gPiBZb3UgYXJl
IHJpZ2h0IGFib3V0IHRoZSBvd25lcnNoaXAgcHJvYmxlbSBhbmQgd2Ugd291bGQgbGlrZSB0byBy
ZWNlaXZlDQo+ID4gaW5wdXQgcmVnYXJkaW5nIHdoYXQgaXMgdGhlIGNvcnJlY3Qgd2F5IG9mIHNv
bHZpbmcgdGhpcy4NCj4gPiBCdXQgaW4gdGhlIG1lYW50aW1lIEkgdGhpbmsgb3VyIHNvbHV0aW9u
IGlzIHF1aXRlIHVzZWZ1bCBldmVuIHRob3VnaA0KPiA+IGlmIGl0IHJlcXVpcmVzIHJvb3QgcHJp
dmlsZWdlcy4gV2UgaGFja2VkIGxpYnZpcnQgc28gdGhhdCBpdCB3b3VsZCBydW4NCj4gPiBxZW11
IGFzIHJvb3QgYW5kIHdpdGhvdXQgZGV2aWNlIGNncm91cC4NCj4gPg0KPiA+IEluIGFueSBjYXNl
LCBkb24ndCB5b3UgdGhpbmsgdGhhdCBhc3NpZ25pbmcgdGhvc2UgZGV2aWNlcyB0byBWRklPDQo+
ID4gc2hvdWxkIGJlIHNhZmU/IERvZXMgdGhlIFZGSU8gZHJpdmVyIG1ha2VzIGFueSB1bnNhZmUg
YXNzdW1wdGlvbnMgb24NCj4gPiB0aGUgVkYncyB0aGF0IG1pZ2h0IGFsbG93IGEgZ3Vlc3QgdG8g
Y3Jhc2ggdGhlIGh5cGVydmlzb3I/DQo+ID4NCj4gPiBJIGFtIHNvbWV3aGF0IGNvbmNlcm5lZCB0
aGF0IHRoZSBWTcKgwqBjb3VsZCB0cmlnZ2VyIHNvbWUgYmFja2Rvb3IgcmVzZXQNCj4gPiB3aGls
ZSB0aGUgaHlwZXJ2aXNvciBpcyBydW5uaW5nIHBjaV9lbmFibGVfc3Jpb3YoLi4uKS4gQnV0IEkn
bSBub3QNCj4gPiByZWFsbHkgc3VyZSBob3cgdG8gc29sdmUgaXQuDQo+ID4gSSBndWVzcyB5b3Ug
aGF2ZSB0byBlaXRoZXIgc3RvcCB0aGUgZ3Vlc3QgZW50aXJlbHkgdG8gZW5hYmxlIHNyaW92IG9y
DQo+ID4gbWFrZSBpdCBwcml2aWxlZ2VkLg0KPiA+DQo+ID4gUmVnYXJkaW5nIGhhdmluZyB0aGUg
UEYgY29udHJvbGxlZCBieSBvbmUgdXNlciB3aGlsZSB0aGUgb3RoZXIgVkZzIGFyZQ0KPiA+IGNv
bnRyb2xsZWQgYnkgb3RoZXIgdXNlciwgSSBhY3R1YWxseSB0aGluayBpdCBtaWdodCBiZSBhbiBp
bnRlcmVzdGluZw0KPiA+IHVzZSBjYXNlLg0KPiANCj4gSXQgbWF5IGJlLCBidXQgaXQgbmVlZHMg
dG8gYmUgYW4gb3B0LWluLCBub3QgYSBzZWN1cml0eSBhY2NpZGVudC4gwqBUaGUgaW50ZXJmYWNl
DQo+IGJldHdlZW4gYSBQRiBhbmQgYSBWRiBpcyBlc3NlbnRpYWwgZGV2aWNlIHNwZWNpZmljIGFu
ZCB3ZSBkb24ndCBrbm93IGV4YWN0bHkNCj4gaG93IGlzb2xhdGVkIGVhY2ggVkYgaXMgZnJvbSB0
aGUgUEYuIMKgSW4gdGhlIHR5cGljYWwgc2NlbmFyaW8gb2YgdGhlIFBGIGJlaW5nDQo+IG93bmVk
IGJ5IHRoZSBob3N0LCB3ZSBoYXZlIGEgY2VydGFpbiBkZWdyZWUgb2YgdHJ1c3QgaW4gdGhlIGhv
c3QsIGl0J3MgcnVubmluZw0KPiB0aGUgVk0gYWZ0ZXIgYWxsLCBpZiBpdCB3YW50ZWQgdG8gY29t
cHJvbWlzZSBpdCwgaXQgY291bGQuIMKgV2UgaGF2ZSBubyBpbXBsaWNpdA0KPiByZWFzb24gdG8g
dHJ1c3QgYSBQRiBydW5uaW5nIGluIGEgZ3Vlc3QgdGhvdWdoLiDCoENhbiB0aGUgc25vb3AgVkYg
dHJhZmZpYywgY2FuDQo+IHRoZXkgZ2VuZXJhdGUgRE1BIG91dHNpZGUgb2YgdGhlIGNvbnRhaW5l
ciBvZiB0aGUgUEYgdXNpbmcgdGhlIFZGPyDCoFdlDQo+IGNhbid0IGJlIHN1cmUuDQo+IMKgU28g
dW5sZXNzIHlvdSBjYW4gbWFrZSB0aGUgZGVmYXVsdCBzY2VuYXJpbyBiZSB0aGF0IFZGcyBjcmVh
dGVkIGJ5IGEgdXNlcg0KPiBvd24gUEYgYXJlIG9ubHkgYXZhaWxhYmxlIGZvciB1c2UgYnkgdGhh
dCB1c2VyLCB3aXRob3V0IHJlbHlpbmcgb24gdXNlcnNwYWNlDQo+IHRvIGludGVydmVuZSwgaXQg
c2VlbXMgbGlrZSBhbnkgcG90ZW50aWFsIHVzZWZ1bG5lc3MgaXMgdHJ1bXBlZCBieSBhIGdpYW50
DQo+IHNlY3VyaXR5IGlzc3VlLiDCoFRoYW5rcywNCg0KSSBkb24ndCB1bmRlcnN0YW5kIHRoZSBz
ZWN1cml0eSBpc3N1ZSwgZG9uJ3QgeW91IG5lZWQgcm9vdCBwZXJtaXNzaW9uIGZvciBkZXZpY2Ug
YXNzaWdubWVudD8NCg0KPiBBbGV4DQo=

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [RFC 0/2] VFIO SRIOV support
@ 2015-12-24  7:22         ` Ilya Lesokhin
  0 siblings, 0 replies; 10+ messages in thread
From: Ilya Lesokhin @ 2015-12-24  7:22 UTC (permalink / raw)
  To: Alex Williamson, kvm, linux-pci
  Cc: bhelgaas, Noa Osherovich, Haggai Eran, Or Gerlitz, Liran Liss

> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Wednesday, December 23, 2015 6:28 PM
> To: Ilya Lesokhin <ilyal@mellanox.com>; kvm@vger.kernel.org; linux-
> pci@vger.kernel.org
> Cc: bhelgaas@google.com; Noa Osherovich <noaos@mellanox.com>; Haggai
> Eran <haggaie@mellanox.com>; Or Gerlitz <ogerlitz@mellanox.com>; Liran
> Liss <liranl@mellanox.com>
> Subject: Re: [RFC 0/2] VFIO SRIOV support
> 
> On Wed, 2015-12-23 at 07:43 +0000, Ilya Lesokhin wrote:
> > Hi Alex,
> > Regarding driver_override, as far as I know you can only use it on
> > devices that were already discovered. Since the devices do not exist
> > before the call to pci_enable_sriov(...) and are already probed after
> > the call  it wouldn't really help us. I would have to unbind them from
> > their default driver and bind them to VFIO like solution a in my
> > original mail.
> 
> If you allow them to be bound to their default driver, then you've already
> created the scenario of a user own PF creating host own VFs, which I think is
> unacceptable.  The driver_override can be set before drivers are probed, the
> fact that pci_enable_sriov() doesn't enable a hook for that is something that
> could be fixed.

That’s essentially the same as solution b in original mail which I was hoping to avoid.

> > You are right about the ownership problem and we would like to receive
> > input regarding what is the correct way of solving this.
> > But in the meantime I think our solution is quite useful even though
> > if it requires root privileges. We hacked libvirt so that it would run
> > qemu as root and without device cgroup.
> >
> > In any case, don't you think that assigning those devices to VFIO
> > should be safe? Does the VFIO driver makes any unsafe assumptions on
> > the VF's that might allow a guest to crash the hypervisor?
> >
> > I am somewhat concerned that the VM  could trigger some backdoor reset
> > while the hypervisor is running pci_enable_sriov(...). But I'm not
> > really sure how to solve it.
> > I guess you have to either stop the guest entirely to enable sriov or
> > make it privileged.
> >
> > Regarding having the PF controlled by one user while the other VFs are
> > controlled by other user, I actually think it might be an interesting
> > use case.
> 
> It may be, but it needs to be an opt-in, not a security accident.  The interface
> between a PF and a VF is essential device specific and we don't know exactly
> how isolated each VF is from the PF.  In the typical scenario of the PF being
> owned by the host, we have a certain degree of trust in the host, it's running
> the VM after all, if it wanted to compromise it, it could.  We have no implicit
> reason to trust a PF running in a guest though.  Can the snoop VF traffic, can
> they generate DMA outside of the container of the PF using the VF?  We
> can't be sure.
>  So unless you can make the default scenario be that VFs created by a user
> own PF are only available for use by that user, without relying on userspace
> to intervene, it seems like any potential usefulness is trumped by a giant
> security issue.  Thanks,

I don't understand the security issue, don't you need root permission for device assignment?

> Alex

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/2] VFIO SRIOV support
  2015-12-24  7:22         ` Ilya Lesokhin
  (?)
@ 2015-12-24 13:51         ` Alex Williamson
  2016-01-05 11:22           ` Haggai Eran
  -1 siblings, 1 reply; 10+ messages in thread
From: Alex Williamson @ 2015-12-24 13:51 UTC (permalink / raw)
  To: Ilya Lesokhin, kvm, linux-pci
  Cc: bhelgaas, Noa Osherovich, Haggai Eran, Or Gerlitz, Liran Liss

On Thu, 2015-12-24 at 07:22 +0000, Ilya Lesokhin wrote:
> > -----Original Message-----
> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > Sent: Wednesday, December 23, 2015 6:28 PM
> > To: Ilya Lesokhin <ilyal@mellanox.com>; kvm@vger.kernel.org; linux-
> > pci@vger.kernel.org
> > Cc: bhelgaas@google.com; Noa Osherovich <noaos@mellanox.com>;
> > Haggai
> > Eran <haggaie@mellanox.com>; Or Gerlitz <ogerlitz@mellanox.com>;
> > Liran
> > Liss <liranl@mellanox.com>
> > Subject: Re: [RFC 0/2] VFIO SRIOV support
> > 
> > On Wed, 2015-12-23 at 07:43 +0000, Ilya Lesokhin wrote:
> > > Hi Alex,
> > > Regarding driver_override, as far as I know you can only use it
> > > on
> > > devices that were already discovered. Since the devices do not
> > > exist
> > > before the call to pci_enable_sriov(...) and are already probed
> > > after
> > > the call  it wouldn't really help us. I would have to unbind them
> > > from
> > > their default driver and bind them to VFIO like solution a in my
> > > original mail.
> > 
> > If you allow them to be bound to their default driver, then you've
> > already
> > created the scenario of a user own PF creating host own VFs, which
> > I think is
> > unacceptable.  The driver_override can be set before drivers are
> > probed, the
> > fact that pci_enable_sriov() doesn't enable a hook for that is
> > something that
> > could be fixed.
> 
> That’s essentially the same as solution b in original mail which I
> was hoping to avoid.
> 
> > > You are right about the ownership problem and we would like to
> > > receive
> > > input regarding what is the correct way of solving this.
> > > But in the meantime I think our solution is quite useful even
> > > though
> > > if it requires root privileges. We hacked libvirt so that it
> > > would run
> > > qemu as root and without device cgroup.
> > > 
> > > In any case, don't you think that assigning those devices to VFIO
> > > should be safe? Does the VFIO driver makes any unsafe assumptions
> > > on
> > > the VF's that might allow a guest to crash the hypervisor?
> > > 
> > > I am somewhat concerned that the VM  could trigger some backdoor
> > > reset
> > > while the hypervisor is running pci_enable_sriov(...). But I'm
> > > not
> > > really sure how to solve it.
> > > I guess you have to either stop the guest entirely to enable
> > > sriov or
> > > make it privileged.
> > > 
> > > Regarding having the PF controlled by one user while the other
> > > VFs are
> > > controlled by other user, I actually think it might be an
> > > interesting
> > > use case.
> > 
> > It may be, but it needs to be an opt-in, not a security accident.
> >  The interface
> > between a PF and a VF is essential device specific and we don't
> > know exactly
> > how isolated each VF is from the PF.  In the typical scenario of
> > the PF being
> > owned by the host, we have a certain degree of trust in the host,
> > it's running
> > the VM after all, if it wanted to compromise it, it could.  We have
> > no implicit
> > reason to trust a PF running in a guest though.  Can the snoop VF
> > traffic, can
> > they generate DMA outside of the container of the PF using the VF?
> >  We
> > can't be sure.
> >  So unless you can make the default scenario be that VFs created by
> > a user
> > own PF are only available for use by that user, without relying on
> > userspace
> > to intervene, it seems like any potential usefulness is trumped by
> > a giant
> > security issue.  Thanks,
> 
> I don't understand the security issue, don't you need root permission
> for device assignment?

No.  A privileged entity needs to grant a user ownership of a group and
sufficient locked memory limits to make it useful, but then use of the
group does not require root permission.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/2] VFIO SRIOV support
  2015-12-24 13:51         ` Alex Williamson
@ 2016-01-05 11:22           ` Haggai Eran
  0 siblings, 0 replies; 10+ messages in thread
From: Haggai Eran @ 2016-01-05 11:22 UTC (permalink / raw)
  To: Alex Williamson, Ilya Lesokhin, kvm, linux-pci
  Cc: bhelgaas, Noa Osherovich, Or Gerlitz, Liran Liss

On 24/12/2015 15:51, Alex Williamson wrote:
> No.  A privileged entity needs to grant a user ownership of a group and
> sufficient locked memory limits to make it useful, but then use of the
> group does not require root permission.

So we're thinking how we can force the VFs in these cases to be in the same
IOMMU group with the PF, and make sure it is vfio-pci that probes them. We
thought about the following:

We could add a flag to pci_dev->dev_flags on the PF, that says that the PF's
VFs must be in the same IOMMU group with it. Modify
iommu_group_get_for_pci_dev() so that it will return the PFs group for VFs
whose PF has that flag set.

In the vfio_group_nb_add_dev() function set driver_override to "vfio-pci" for
PCI devices that are added to a live group. That would prevent the host from
probing these devices with the default driver.

What do you think?

Regards,
Haggai and Ilya

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-01-05 11:56 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-22 13:42 [RFC 0/2] VFIO SRIOV support Ilya Lesokhin
2015-12-22 13:42 ` [RFC 1/2] PCI: Expose iov_set_numvfs and iov_resource_size for modules Ilya Lesokhin
2015-12-22 13:42 ` [RFC 2/2] VFIO: Add support for SRIOV extended capablity Ilya Lesokhin
2015-12-22 15:35 ` [RFC 0/2] VFIO SRIOV support Alex Williamson
2015-12-23  7:43   ` Ilya Lesokhin
2015-12-23 16:28     ` Alex Williamson
2015-12-24  7:22       ` Ilya Lesokhin
2015-12-24  7:22         ` Ilya Lesokhin
2015-12-24 13:51         ` Alex Williamson
2016-01-05 11:22           ` Haggai Eran

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.