kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 1/5] iommu: Add iommu_device_group callback and iommu_group sysfs entry
       [not found] <20110901194915.2391.97400.stgit@s20.home>
@ 2011-09-01 19:50 ` Alex Williamson
  2011-09-01 19:50 ` [RFC PATCH 2/5] intel-iommu: Implement iommu_device_group Alex Williamson
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Alex Williamson @ 2011-09-01 19:50 UTC (permalink / raw)
  To: chrisw, aik, pmac, dwg, joerg.roedel, agraf, benve, aafabbri,
	B08248, B07421
  Cc: alex.williamson

An IOMMU group is a set of devices for which the IOMMU cannot
distinguish transactions.  For PCI devices, a group often occurs
when a PCI bridge is involved.  Transactions from any device
behind the bridge appear to be sourced from the bridge itself.
We leave it to the IOMMU driver to define the grouping restraints
for their platform.

Using this new interface, the group for a device can be retrieved
using the iommu_device_group() callback.  Users will compare the
value returned against the value returned for other devices to
determine whether they are part of the same group.  Devices with
no group are not translated by the IOMMU.  There should be no
expectations about the group numbers as they may be arbitrarily
assigned by the IOMMU driver and may not be persistent across boots.

We also provide a sysfs interface to the group numbers here so
that user space can understand IOMMU dependencies between devices
for managing safe, user space drivers.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 drivers/base/iommu.c  |   51 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/iommu.h |    6 ++++++
 2 files changed, 57 insertions(+), 0 deletions(-)

diff --git a/drivers/base/iommu.c b/drivers/base/iommu.c
index 6e6b6a1..566aa17 100644
--- a/drivers/base/iommu.c
+++ b/drivers/base/iommu.c
@@ -17,20 +17,63 @@
  */
 
 #include <linux/bug.h>
+#include <linux/device.h>
 #include <linux/types.h>
 #include <linux/module.h>
 #include <linux/slab.h>
 #include <linux/errno.h>
 #include <linux/iommu.h>
+#include <linux/pci.h>
 
 static struct iommu_ops *iommu_ops;
 
+static ssize_t show_iommu_group(struct device *dev,
+				struct device_attribute *attr, char *buf)
+{
+	unsigned int groupid;
+
+	if (iommu_device_group(dev, &groupid))
+		return 0;
+
+	return sprintf(buf, "%u", groupid);
+}
+static DEVICE_ATTR(iommu_group, S_IRUGO, show_iommu_group, NULL);
+
+static int add_iommu_group(struct device *dev, void *unused)
+{
+	unsigned int groupid;
+
+	if (iommu_device_group(dev, &groupid) == 0)
+		return device_create_file(dev, &dev_attr_iommu_group);
+
+	return 0;
+}
+
+static int device_notifier(struct notifier_block *nb,
+			   unsigned long action, void *data)
+{
+	struct device *dev = data;
+
+	if (action == BUS_NOTIFY_ADD_DEVICE)
+		return add_iommu_group(dev, NULL);
+
+	return 0;
+}
+
+static struct notifier_block device_nb = {
+	.notifier_call = device_notifier,
+};
+
 void register_iommu(struct iommu_ops *ops)
 {
 	if (iommu_ops)
 		BUG();
 
 	iommu_ops = ops;
+
+	/* FIXME - non-PCI, really want for_each_bus() */
+	bus_register_notifier(&pci_bus_type, &device_nb);
+	bus_for_each_dev(&pci_bus_type, NULL, NULL, add_iommu_group);
 }
 
 bool iommu_found(void)
@@ -94,6 +137,14 @@ int iommu_domain_has_cap(struct iommu_domain *domain,
 }
 EXPORT_SYMBOL_GPL(iommu_domain_has_cap);
 
+int iommu_device_group(struct device *dev, unsigned int *groupid)
+{
+	if (iommu_ops->device_group)
+		return iommu_ops->device_group(dev, groupid);
+	return -ENODEV;
+}
+EXPORT_SYMBOL_GPL(iommu_device_group);
+
 int iommu_map(struct iommu_domain *domain, unsigned long iova,
 	      phys_addr_t paddr, int gfp_order, int prot)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 0a2ba40..e3a53ed 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -45,6 +45,7 @@ struct iommu_ops {
 				    unsigned long iova);
 	int (*domain_has_cap)(struct iommu_domain *domain,
 			      unsigned long cap);
+	int (*device_group)(struct device *dev, unsigned int *groupid);
 };
 
 #ifdef CONFIG_IOMMU_API
@@ -65,6 +66,7 @@ extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
 				      unsigned long iova);
 extern int iommu_domain_has_cap(struct iommu_domain *domain,
 				unsigned long cap);
+extern int iommu_device_group(struct device *dev, unsigned int *groupid);
 
 #else /* CONFIG_IOMMU_API */
 
@@ -121,6 +123,10 @@ static inline int domain_has_cap(struct iommu_domain *domain,
 	return 0;
 }
 
+static inline int iommu_device_group(struct device *dev, unsigned int *groupid);
+{
+	return -ENODEV;
+}
 #endif /* CONFIG_IOMMU_API */
 
 #endif /* __LINUX_IOMMU_H */


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 2/5] intel-iommu: Implement iommu_device_group
       [not found] <20110901194915.2391.97400.stgit@s20.home>
  2011-09-01 19:50 ` [RFC PATCH 1/5] iommu: Add iommu_device_group callback and iommu_group sysfs entry Alex Williamson
@ 2011-09-01 19:50 ` Alex Williamson
  2011-09-01 19:50 ` [RFC PATCH 3/5] VFIO: Base framework for new VFIO driver Alex Williamson
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Alex Williamson @ 2011-09-01 19:50 UTC (permalink / raw)
  To: chrisw, aik, pmac, dwg, joerg.roedel, agraf, benve, aafabbri,
	B08248, B07421
  Cc: alex.williamson

We generally have BDF granularity for devices, so we just need
to make sure devices aren't hidden behind PCIe-to-PCI bridges.
We can then make up a group number that's simply the concatenated
seg|bus|dev|fn so we don't have to track them (not that users
should depend on that).

Also add an option to group multi-function (non-SR-IOV) devices
together.  It's disturbingly not uncommon for functions to have
dependencies between each other on the same device and systems
may wish to enforce that they are grouped together.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 drivers/pci/intel-iommu.c |   52 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 52 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index f02c34d..a4d9a1a 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -404,6 +404,7 @@ static int dmar_map_gfx = 1;
 static int dmar_forcedac;
 static int intel_iommu_strict;
 static int intel_iommu_superpage = 1;
+static int intel_iommu_no_mf_groups;
 
 #define DUMMY_DEVICE_DOMAIN_INFO ((struct device_domain_info *)(-1))
 static DEFINE_SPINLOCK(device_domain_lock);
@@ -438,6 +439,10 @@ static int __init intel_iommu_setup(char *str)
 			printk(KERN_INFO
 				"Intel-IOMMU: disable supported super page\n");
 			intel_iommu_superpage = 0;
+		} else if (!strncmp(str, "no_mf_groups", 12)) {
+			printk(KERN_INFO
+				"Intel-IOMMU: disable separate groups for multifunction devices\n");
+			intel_iommu_no_mf_groups = 1;
 		}
 
 		str += strcspn(str, ",");
@@ -3902,6 +3907,52 @@ static int intel_iommu_domain_has_cap(struct iommu_domain *domain,
 	return 0;
 }
 
+/* Group numbers are arbitrary.  Device with the same group number
+ * indicate the iommu cannot differentiate between them.  To avoid
+ * tracking used groups we just use the seg|bus|devfn of the lowest
+ * level we're able to differentiate devices */
+static int intel_iommu_device_group(struct device *dev, unsigned int *groupid)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct pci_dev *bridge;
+	union {
+		struct {
+			u8 devfn;
+			u8 bus;
+			u16 segment;
+		} pci;
+		u32 group;
+	} id;
+
+	if (iommu_no_mapping(dev))
+		return -ENODEV;
+
+	id.pci.segment = pci_domain_nr(pdev->bus);
+	id.pci.bus = pdev->bus->number;
+	id.pci.devfn = pdev->devfn;
+
+	if (!device_to_iommu(id.pci.segment, id.pci.bus, id.pci.devfn))
+		return -ENODEV;
+
+	bridge = pci_find_upstream_pcie_bridge(pdev);
+	if (bridge) {
+		if (pci_is_pcie(bridge)) {
+			id.pci.bus = bridge->subordinate->number;
+			id.pci.devfn = 0;
+		} else {
+			id.pci.bus = bridge->bus->number;
+			id.pci.devfn = bridge->devfn;
+		}
+	}
+
+	/* Virtual functions always get their own group */
+	if (!pdev->is_virtfn && intel_iommu_no_mf_groups)
+		id.pci.devfn = PCI_DEVFN(PCI_SLOT(id.pci.devfn), 0);
+
+	*groupid = id.group;
+	return 0;
+}
+
 static struct iommu_ops intel_iommu_ops = {
 	.domain_init	= intel_iommu_domain_init,
 	.domain_destroy = intel_iommu_domain_destroy,
@@ -3911,6 +3962,7 @@ static struct iommu_ops intel_iommu_ops = {
 	.unmap		= intel_iommu_unmap,
 	.iova_to_phys	= intel_iommu_iova_to_phys,
 	.domain_has_cap = intel_iommu_domain_has_cap,
+	.device_group	= intel_iommu_device_group,
 };
 
 static void __devinit quirk_iommu_rwbf(struct pci_dev *dev)


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 3/5] VFIO: Base framework for new VFIO driver
       [not found] <20110901194915.2391.97400.stgit@s20.home>
  2011-09-01 19:50 ` [RFC PATCH 1/5] iommu: Add iommu_device_group callback and iommu_group sysfs entry Alex Williamson
  2011-09-01 19:50 ` [RFC PATCH 2/5] intel-iommu: Implement iommu_device_group Alex Williamson
@ 2011-09-01 19:50 ` Alex Williamson
  2011-09-01 19:50 ` [RFC PATCH 4/5] VFIO: Add PCI device support Alex Williamson
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Alex Williamson @ 2011-09-01 19:50 UTC (permalink / raw)
  To: chrisw, aik, pmac, dwg, joerg.roedel, agraf, benve, aafabbri,
	B08248, B07421
  Cc: alex.williamson

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 drivers/Kconfig             |    2 
 drivers/Makefile            |    1 
 drivers/vfio/Kconfig        |    5 
 drivers/vfio/Makefile       |    3 
 drivers/vfio/vfio_device.c  |  109 +++++
 drivers/vfio/vfio_iommu.c   |   81 ++++
 drivers/vfio/vfio_main.c    |  879 +++++++++++++++++++++++++++++++++++++++++++
 drivers/vfio/vfio_private.h |   82 ++++
 8 files changed, 1162 insertions(+), 0 deletions(-)
 create mode 100644 drivers/vfio/Kconfig
 create mode 100644 drivers/vfio/Makefile
 create mode 100644 drivers/vfio/vfio_device.c
 create mode 100644 drivers/vfio/vfio_iommu.c
 create mode 100644 drivers/vfio/vfio_main.c
 create mode 100644 drivers/vfio/vfio_private.h

diff --git a/drivers/Kconfig b/drivers/Kconfig
index 3bb154d..5b5fffc 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -112,6 +112,8 @@ source "drivers/auxdisplay/Kconfig"
 
 source "drivers/uio/Kconfig"
 
+source "drivers/vfio/Kconfig"
+
 source "drivers/vlynq/Kconfig"
 
 source "drivers/xen/Kconfig"
diff --git a/drivers/Makefile b/drivers/Makefile
index 09f3232..6b17848 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -56,6 +56,7 @@ obj-$(CONFIG_ATM)		+= atm/
 obj-$(CONFIG_FUSION)		+= message/
 obj-y				+= firewire/
 obj-$(CONFIG_UIO)		+= uio/
+obj-$(CONFIG_VFIO)		+= vfio/
 obj-y				+= cdrom/
 obj-y				+= auxdisplay/
 obj-$(CONFIG_PCCARD)		+= pcmcia/
diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
new file mode 100644
index 0000000..a150521
--- /dev/null
+++ b/drivers/vfio/Kconfig
@@ -0,0 +1,5 @@
+menuconfig VFIO
+	tristate "Non-Privileged User Space driver"
+	depends on IOMMU_API
+	help
+	  If you don't know what to do here, say N.
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
new file mode 100644
index 0000000..5eaa074
--- /dev/null
+++ b/drivers/vfio/Makefile
@@ -0,0 +1,3 @@
+obj-$(CONFIG_VFIO) := vfio.o
+
+vfio-y := vfio_main.o vfio_iommu.o vfio_device.o
diff --git a/drivers/vfio/vfio_device.c b/drivers/vfio/vfio_device.c
new file mode 100644
index 0000000..101cbbf
--- /dev/null
+++ b/drivers/vfio/vfio_device.c
@@ -0,0 +1,109 @@
+/*
+ * Copyright (C) 2011 Red Hat, Inc.  All rights reserved.
+ *     Author: Alex Williamson <alex.williamson@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Derived from original vfio:
+ * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
+ * Author: Tom Lyon, pugs@cisco.com
+ */
+
+/*
+ * VFIO device module: Common device handling and callouts to other drivers
+ */
+
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/mm.h>
+#include <linux/string.h>
+#include <linux/interrupt.h>
+#include <linux/fs.h>
+#include <linux/eventfd.h>
+#include <linux/uaccess.h>
+#include <linux/compat.h>
+#include <linux/vfio.h>
+
+#include "vfio_private.h"
+
+static int vfio_device_release(struct inode *inode, struct file *filep)
+{
+	struct vfio_device *vdev = filep->private_data;
+
+	mutex_lock(&vdev->vfio->group_lock);
+	vdev->refcnt--;
+	vdev->iommu->refcnt--;
+	mutex_unlock(&vdev->vfio->group_lock);
+
+	return 0;
+}
+
+static long vfio_device_unl_ioctl(struct file *filep,
+				  unsigned int cmd, unsigned long arg)
+{
+	struct vfio_device *vdev = filep->private_data;
+	int ret = -EINVAL;
+
+	switch (cmd) {
+	// TBD - what can we handle as common device ioctls?
+	default:
+		if (vdev->ops->fops.unlocked_ioctl)
+			ret = vdev->ops->fops.unlocked_ioctl(filep, cmd, arg);
+	}
+	return ret;
+}
+
+static ssize_t vfio_device_read(struct file *filep, char __user *buf,
+				size_t count, loff_t *ppos)
+{
+	struct vfio_device *vdev = filep->private_data;
+
+	if (vdev->ops->fops.read)
+		return vdev->ops->fops.read(filep, buf, count, ppos);
+
+	return -EINVAL;
+}
+
+static ssize_t vfio_device_write(struct file *filep, const char __user *buf,
+				 size_t count, loff_t *ppos)
+{
+	struct vfio_device *vdev = filep->private_data;
+
+	if (vdev->ops->fops.write)
+		return vdev->ops->fops.write(filep, buf, count, ppos);
+
+	return -EINVAL;
+}
+
+static int vfio_device_mmap(struct file *filep, struct vm_area_struct *vma)
+{
+	struct vfio_device *vdev = filep->private_data;
+
+	if (vdev->ops->fops.mmap)
+		return vdev->ops->fops.mmap(filep, vma);
+
+	return -EINVAL;
+}
+	
+#ifdef CONFIG_COMPAT
+static long vfio_device_compat_ioctl(struct file *filep,
+				     unsigned int cmd, unsigned long arg)
+{
+	arg = (unsigned long)compat_ptr(arg);
+	return vfio_device_unl_ioctl(filep, cmd, arg);
+}
+#endif	/* CONFIG_COMPAT */
+
+const struct file_operations vfio_device_fops = {
+	.owner		= THIS_MODULE,
+	.release	= vfio_device_release,
+	.read		= vfio_device_read,
+	.write		= vfio_device_write,
+	.unlocked_ioctl	= vfio_device_unl_ioctl,
+#ifdef CONFIG_COMPAT
+	.compat_ioctl	= vfio_device_compat_ioctl,
+#endif
+	.mmap		= vfio_device_mmap,
+};
diff --git a/drivers/vfio/vfio_iommu.c b/drivers/vfio/vfio_iommu.c
new file mode 100644
index 0000000..1a6f321
--- /dev/null
+++ b/drivers/vfio/vfio_iommu.c
@@ -0,0 +1,81 @@
+/*
+ * Copyright (C) 2011 Red Hat, Inc.  All rights reserved.
+ *     Author: Alex Williamson <alex.williamson@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Derived from original vfio:
+ * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
+ * Author: Tom Lyon, pugs@cisco.com
+ */
+
+/*
+ * VFIO iomm module: iommu fd callbacks
+ */
+
+#include <linux/compat.h>
+#include <linux/device.h>
+#include <linux/fs.h>
+#include <linux/iommu.h>
+#include <linux/module.h>
+#include <linux/uaccess.h>
+#include <linux/vfio.h>
+
+#include "vfio_private.h"
+
+static int vfio_iommu_release(struct inode *inode, struct file *filep)
+{
+	struct vfio_iommu *viommu = filep->private_data;
+
+	mutex_lock(&viommu->vfio->group_lock);
+	viommu->refcnt--;
+	mutex_unlock(&viommu->vfio->group_lock);
+	return 0;
+}
+
+static long vfio_iommu_unl_ioctl(struct file *filep,
+				 unsigned int cmd, unsigned long arg)
+{
+	struct vfio_iommu *viommu = filep->private_data;
+	struct vfio_dma_map dm;
+	int ret = -ENOSYS;
+
+	switch (cmd) {
+	case VFIO_IOMMU_MAP_DMA:
+		if (copy_from_user(&dm, (void __user *)arg, sizeof dm))
+			return -EFAULT;
+		ret = 0; // XXX - Do something
+		if (!ret && copy_to_user((void __user *)arg, &dm, sizeof dm))
+			ret = -EFAULT;
+		break;
+
+	case VFIO_IOMMU_UNMAP_DMA:
+		if (copy_from_user(&dm, (void __user *)arg, sizeof dm))
+			return -EFAULT;
+		ret = 0; // XXX - Do something
+		if (!ret && copy_to_user((void __user *)arg, &dm, sizeof dm))
+			ret = -EFAULT;
+		break;
+	}
+	return ret;
+}
+
+#ifdef CONFIG_COMPAT
+static long vfio_iommu_compat_ioctl(struct file *filep,
+				    unsigned int cmd, unsigned long arg)
+{
+	arg = (unsigned long)compat_ptr(arg);
+	return vfio_iommu_unl_ioctl(filep, cmd, arg);
+}
+#endif	/* CONFIG_COMPAT */
+
+const struct file_operations vfio_iommu_fops = {
+	.owner		= THIS_MODULE,
+	.release	= vfio_iommu_release,
+	.unlocked_ioctl	= vfio_iommu_unl_ioctl,
+#ifdef CONFIG_COMPAT
+	.compat_ioctl	= vfio_iommu_compat_ioctl,
+#endif
+};
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
new file mode 100644
index 0000000..7f05692
--- /dev/null
+++ b/drivers/vfio/vfio_main.c
@@ -0,0 +1,879 @@
+/*
+ * Copyright (C) 2011 Red Hat, Inc.  All rights reserved.
+ *     Author: Alex Williamson <alex.williamson@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Derived from original vfio:
+ * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
+ * Author: Tom Lyon, pugs@cisco.com
+ */
+
+/*
+ * VFIO main module: IOMMU group framework
+ */
+
+#include <linux/cdev.h>
+#include <linux/compat.h>
+#include <linux/device.h>
+#include <linux/file.h>
+#include <linux/anon_inodes.h>
+#include <linux/fs.h>
+#include <linux/idr.h>
+#include <linux/iommu.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/uaccess.h>
+#include <linux/vfio.h>
+
+#include "vfio_private.h"
+
+#define DRIVER_VERSION	"0.2"
+#define DRIVER_AUTHOR	"Alex Williamson <alex.williamson@redhat.com>"
+#define DRIVER_DESC	"VFIO - User Level meta-driver"
+
+#define MAX_PATH	256
+
+static int allow_unsafe_intrs;
+module_param(allow_unsafe_intrs, int, 0);
+MODULE_PARM_DESC(allow_unsafe_intrs,
+        "Allow use of IOMMUs which do not support interrupt remapping");
+
+static struct vfio vfio;
+static const struct file_operations vfio_group_fops;
+
+static inline void vfio_container_reset_read(struct vfio_container *vcontainer)
+{
+	kfree(vcontainer->read_buf);
+	vcontainer->read_buf = NULL;
+}
+
+int vfio_group_add_dev(struct device *dev, void *data)
+{
+	struct vfio_device_ops *ops = data;
+	struct list_head *pos;
+	struct vfio_group *vgroup = NULL;
+	struct vfio_device *vdev = NULL;
+	unsigned int group;
+	int ret = 0, new_group = 0;
+
+	if (iommu_device_group(dev, &group))
+		return 0;
+
+	mutex_lock(&vfio.group_lock);
+
+	list_for_each(pos, &vfio.group_list) {
+		vgroup = list_entry(pos, struct vfio_group, next);
+		if (vgroup->group == group)
+			break;
+		vgroup = NULL;
+	}
+
+	if (!vgroup) {
+		int id;
+
+		if (unlikely(idr_pre_get(&vfio.idr, GFP_KERNEL) == 0)) {
+			ret = -ENOMEM;
+			goto out;
+		}
+		vgroup = kzalloc(sizeof(*vgroup), GFP_KERNEL);
+		if (!vgroup) {
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		vgroup->group = group;
+		INIT_LIST_HEAD(&vgroup->device_list);
+
+		ret = idr_get_new(&vfio.idr, vgroup, &id);
+		if (ret == 0 && id > MINORMASK) {
+			idr_remove(&vfio.idr, id);
+			kfree(vgroup);
+			ret = -ENOSPC;
+			goto out;
+		}
+
+		vgroup->devt = MKDEV(MAJOR(vfio.devt), id);
+		list_add(&vgroup->next, &vfio.group_list);
+		device_create(vfio.class, NULL, vgroup->devt,
+			      vgroup, "%u", group);
+
+		new_group = 1;
+	} else {
+		list_for_each(pos, &vgroup->device_list) {
+			vdev = list_entry(pos, struct vfio_device, next);
+			if (vdev->dev == dev)
+				break;
+			vdev = NULL;
+		}
+	}
+
+	if (!vdev) {
+		/* Adding a device for a group that's already in use? */
+		/* Maybe we should attach to the domain so others can't */
+		BUG_ON(vgroup->container &&
+		       vgroup->container->iommu &&
+		       vgroup->container->iommu->refcnt);
+
+		vdev = ops->new(dev);
+		if (IS_ERR(vdev)) {
+			/* If we just created this vgroup, tear it down */
+			if (new_group) {
+				device_destroy(vfio.class, vgroup->devt);
+				idr_remove(&vfio.idr, MINOR(vgroup->devt));
+				list_del(&vgroup->next);
+				kfree(vgroup);
+			}
+			ret = PTR_ERR(vdev);
+			goto out;
+		}
+		list_add(&vdev->next, &vgroup->device_list);
+		vdev->dev = dev;
+		vdev->ops = ops;
+		vdev->vfio = &vfio;
+	}
+out:
+	mutex_unlock(&vfio.group_lock);
+	return ret;
+}
+
+void vfio_group_del_dev(struct device *dev)
+{
+	struct list_head *pos;
+	struct vfio_container *vcontainer;
+	struct vfio_group *vgroup = NULL;
+	struct vfio_device *vdev = NULL;
+	unsigned int group;
+
+	if (iommu_device_group(dev, &group))
+		return;
+
+	mutex_lock(&vfio.group_lock);
+
+	list_for_each(pos, &vfio.group_list) {
+		vgroup = list_entry(pos, struct vfio_group, next);
+		if (vgroup->group == group)
+			break;
+		vgroup = NULL;
+	}
+
+	if (!vgroup)
+		goto out;
+
+	vcontainer = vgroup->container;
+
+	list_for_each(pos, &vgroup->device_list) {
+		vdev = list_entry(pos, struct vfio_device, next);
+		if (vdev->dev == dev)
+			break;
+		vdev = NULL;
+	}
+
+	if (!vdev)
+		goto out;
+
+	/* XXX Did a device we're using go away? */
+	BUG_ON(vdev->refcnt);
+
+	if (vcontainer && vcontainer->iommu) {
+		iommu_detach_device(vcontainer->iommu->domain, vdev->dev);
+		vfio_container_reset_read(vcontainer);
+	}
+
+	list_del(&vdev->next);
+	vdev->ops->free(vdev);
+
+	if (list_empty(&vgroup->device_list) && vgroup->refcnt == 0) {
+		device_destroy(vfio.class, vgroup->devt);
+		idr_remove(&vfio.idr, MINOR(vgroup->devt));
+		list_del(&vgroup->next);
+		kfree(vgroup);
+	}
+out:
+	mutex_unlock(&vfio.group_lock);
+}
+
+static int __vfio_group_viable(struct vfio_container *vcontainer)
+{
+	struct list_head *gpos, *dpos;
+
+	list_for_each(gpos, &vfio.group_list) {
+		struct vfio_group *vgroup;
+		vgroup = list_entry(gpos, struct vfio_group, next);
+		if (vgroup->container != vcontainer)
+			continue;
+
+		list_for_each(dpos, &vgroup->device_list) {
+			struct vfio_device *vdev;
+			vdev = list_entry(dpos, struct vfio_device, next);
+
+			if (!vdev->dev->driver ||
+			    vdev->dev->driver->owner != THIS_MODULE)
+				return 0;
+		}
+	}
+	return 1;
+}
+
+static int __vfio_close_iommu(struct vfio_container *vcontainer)
+{
+	struct list_head *gpos, *dpos;
+	struct vfio_iommu *viommu = vcontainer->iommu;
+	struct vfio_group *vgroup;
+	struct vfio_device *vdev;
+
+	if (!viommu)
+		return 0;
+
+	if (viommu->refcnt)
+		return -EBUSY;
+
+	list_for_each(gpos, &vfio.group_list) {
+		vgroup = list_entry(gpos, struct vfio_group, next);
+		if (vgroup->container != vcontainer)
+			continue;
+
+		list_for_each(dpos, &vgroup->device_list) {
+			vdev = list_entry(dpos, struct vfio_device, next);
+			iommu_detach_device(viommu->domain, vdev->dev);
+			vdev->iommu = NULL;
+		}
+	}
+	iommu_domain_free(viommu->domain);
+	kfree(viommu);
+	vcontainer->iommu = NULL;
+	return 0;
+}
+
+static int __vfio_open_iommu(struct vfio_container *vcontainer)
+{
+	struct list_head *gpos, *dpos;
+	struct vfio_iommu *viommu;
+	struct vfio_group *vgroup;
+	struct vfio_device *vdev;
+
+	if (!__vfio_group_viable(vcontainer))
+		return -EBUSY;
+
+	viommu = kzalloc(sizeof(*viommu), GFP_KERNEL);
+	if (!viommu)
+		return -ENOMEM;
+
+	viommu->domain = iommu_domain_alloc();
+	if (!viommu->domain) {
+		kfree(viommu);
+		return -EFAULT;
+	}
+
+	viommu->vfio = &vfio;
+	vcontainer->iommu = viommu;
+
+	list_for_each(gpos, &vfio.group_list) {
+		vgroup = list_entry(gpos, struct vfio_group, next);
+		if (vgroup->container != vcontainer)
+			continue;
+
+		list_for_each(dpos, &vgroup->device_list) {
+			int ret;
+
+			vdev = list_entry(dpos, struct vfio_device, next);
+
+			ret = iommu_attach_device(viommu->domain, vdev->dev);
+			if (ret) {
+				__vfio_close_iommu(vcontainer);
+				return ret;
+			}
+			vdev->iommu = viommu;
+		}
+	}
+
+	if (!allow_unsafe_intrs &&
+	    !iommu_domain_has_cap(viommu->domain, IOMMU_CAP_INTR_REMAP)) {
+		__vfio_close_iommu(vcontainer);
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
+static int vfio_group_merge(struct vfio_group *vgroup, int fd)
+{
+	struct vfio_group *vgroup2;
+	struct iommu_domain *domain;
+	struct list_head *pos;
+	struct file *file;
+	int ret = 0;
+
+	mutex_lock(&vfio.group_lock);
+
+	file = fget(fd);
+	if (!file) {
+		ret = -EBADF;
+		goto out_noput;
+	}
+	if (file->f_op != &vfio_group_fops) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	vgroup2 = file->private_data;
+	if (!vgroup2 || vgroup2 == vgroup || vgroup2->mm != vgroup->mm ||
+	    (vgroup2->container->iommu && vgroup2->container->iommu->refcnt)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (!vgroup->container->iommu) {
+		ret = __vfio_open_iommu(vgroup->container);
+		if (ret)
+			goto out;
+	}
+
+	if (!vgroup2->container->iommu) {
+		ret = __vfio_open_iommu(vgroup2->container);
+		if (ret)
+			goto out;
+	}
+
+	if (iommu_domain_has_cap(vgroup->container->iommu->domain,
+				 IOMMU_CAP_CACHE_COHERENCY) !=
+	    iommu_domain_has_cap(vgroup2->container->iommu->domain,
+				 IOMMU_CAP_CACHE_COHERENCY)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = __vfio_close_iommu(vgroup2->container);
+	if (ret)
+		goto out;
+
+	domain = vgroup->container->iommu->domain;
+
+	list_for_each(pos, &vgroup2->device_list) {
+		struct vfio_device *vdev;
+
+		vdev = list_entry(pos, struct vfio_device, next);
+
+		ret = iommu_attach_device(domain, vdev->dev);
+		if (ret) {
+			list_for_each(pos, &vgroup2->device_list) {
+				struct vfio_device *vdev2;
+
+				vdev2 = list_entry(pos,
+						   struct vfio_device, next);
+				if (vdev2 == vdev)
+					break;
+
+				iommu_detach_device(domain, vdev2->dev);
+				vdev2->iommu = NULL;
+			}
+			goto out;
+		}
+		vdev->iommu = vgroup->container->iommu;
+	}
+
+	kfree(vgroup2->container->read_buf);
+	kfree(vgroup2->container);
+
+	vgroup2->container = vgroup->container;
+	vgroup->container->refcnt++;
+	vfio_container_reset_read(vgroup->container);
+
+out:
+	fput(file);
+out_noput:
+	mutex_unlock(&vfio.group_lock);
+	return ret;
+}
+
+static int vfio_group_unmerge(struct vfio_group *vgroup, int fd)
+{
+	struct vfio_group *vgroup2;
+	struct vfio_container *vcontainer2;
+	struct vfio_device *vdev;
+	struct list_head *pos;
+	struct file *file;
+	int ret = 0;
+
+	vcontainer2 = kzalloc(sizeof(*vcontainer2), GFP_KERNEL);
+	if (!vcontainer2)
+		return -ENOMEM;
+
+	mutex_lock(&vfio.group_lock);
+
+	file = fget(fd);
+	if (!file) {
+		ret = -EBADF;
+		goto out_noput;
+	}
+	if (file->f_op != &vfio_group_fops) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	vgroup2 = file->private_data;
+	if (!vgroup2 || vgroup2 == vgroup ||
+	    vgroup2->container != vgroup->container) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	list_for_each(pos, &vgroup2->device_list) {
+		vdev = list_entry(pos, struct vfio_device, next);
+		if (vdev->refcnt) {
+			ret = -EBUSY;
+			goto out;
+		}
+	}
+
+	list_for_each(pos, &vgroup2->device_list) {
+		vdev = list_entry(pos, struct vfio_device, next);
+		iommu_detach_device(vgroup->container->iommu->domain,
+				    vdev->dev);
+		vdev->iommu = NULL;
+	}
+
+	vgroup2->container = vcontainer2;
+	vcontainer2->refcnt++;
+	vgroup->container->refcnt--;
+	vfio_container_reset_read(vgroup->container);
+out:
+	fput(file);
+out_noput:
+	if (ret)
+		kfree(vcontainer2);
+	mutex_unlock(&vfio.group_lock);
+	return ret;
+}
+
+static int vfio_group_get_iommu_fd(struct vfio_group *vgroup)
+{
+	int ret = 0;
+	struct vfio_iommu *viommu;
+
+	mutex_lock(&vfio.group_lock);
+
+	if (!vgroup->container->iommu) {
+		ret = __vfio_open_iommu(vgroup->container);
+		if (ret)
+			goto out;
+	}
+
+	viommu = vgroup->container->iommu;
+
+	if (!viommu->file) {
+		viommu->file = anon_inode_getfile("vfio-iommu",
+						  &vfio_iommu_fops,
+						  viommu, O_RDWR);
+		if (IS_ERR(viommu->file)) {
+			ret = PTR_ERR(viommu->file);
+			viommu->file = NULL;
+			goto out;
+		}
+	}
+	ret = get_unused_fd();
+	if (ret < 0)
+		goto out;
+
+	fd_install(ret, viommu->file);
+
+	vgroup->container->iommu->refcnt++;
+out:
+	mutex_unlock(&vfio.group_lock);
+	return ret;
+}
+
+static int vfio_group_get_device_fd(struct vfio_group *vgroup, char *buf)
+{
+	struct vfio_container *vcontainer = vgroup->container;
+	struct list_head *gpos, *dpos;
+	int ret = -ENODEV;
+
+	mutex_lock(&vfio.group_lock);
+
+	if (!vcontainer->iommu) {
+		ret = __vfio_open_iommu(vcontainer);
+		if (ret)
+			goto out;
+	}
+
+	list_for_each(gpos, &vfio.group_list) {
+		vgroup = list_entry(gpos, struct vfio_group, next);
+		if (vgroup->container != vcontainer)
+			continue;
+
+		list_for_each(dpos, &vgroup->device_list) {
+			struct vfio_device *vdev;
+			char buf2[MAX_PATH];
+
+			vdev = list_entry(dpos, struct vfio_device, next);
+
+			snprintf(buf2, MAX_PATH, "%s", dev_name(vdev->dev));
+
+			if (!strncmp(buf, buf2, MAX_PATH)) {
+				if (!vdev->file) {
+					vdev->file = anon_inode_getfile(
+							"vfio-device",
+							&vfio_device_fops,
+							vdev, O_RDWR);
+					if (IS_ERR(vdev->file)) {
+						ret = PTR_ERR(vdev->file);
+						vdev->file = NULL;
+						goto out;
+					}
+				}
+				ret = get_unused_fd();
+				if (ret < 0)
+					goto out;
+
+				fd_install(ret, vdev->file);
+
+				vdev->refcnt++;
+				vcontainer->iommu->refcnt++;
+				goto out;
+			}
+		}
+	}
+out:
+	mutex_unlock(&vfio.group_lock);
+	return ret;
+}
+
+static long vfio_group_unl_ioctl(struct file *filep,
+				 unsigned int cmd, unsigned long arg)
+{
+	struct vfio_group *vgroup = filep->private_data;
+
+	if (vgroup->mm != current->mm)
+		return -EIO;
+
+	switch (cmd) {
+	case VFIO_GROUP_MERGE:
+	case VFIO_GROUP_UNMERGE:
+		{
+			int fd;
+		
+			if (get_user(fd, (int __user *)arg))
+				return -EFAULT;
+			if (fd < 0)
+				return -EINVAL;
+
+			if (cmd == VFIO_GROUP_MERGE)
+				return vfio_group_merge(vgroup, fd);
+			else
+				return vfio_group_unmerge(vgroup, fd);
+		}
+	case VFIO_GROUP_GET_IOMMU_FD:
+		return vfio_group_get_iommu_fd(vgroup);
+	case VFIO_GROUP_GET_DEVICE_FD:
+		{
+			char *buf;
+			int ret;
+
+			buf = strndup_user((const char __user *)arg, MAX_PATH);
+			if (IS_ERR(buf))
+				return PTR_ERR(buf);
+
+			ret = vfio_group_get_device_fd(vgroup, buf);
+			kfree(buf);
+			return ret;
+		}
+	}
+	return -ENOSYS;
+}
+
+
+#ifdef CONFIG_COMPAT
+static long vfio_group_compat_ioctl(struct file *filep,
+				    unsigned int cmd, unsigned long arg)
+{
+	arg = (unsigned long)compat_ptr(arg);
+	return vfio_group_unl_ioctl(filep, cmd, arg);
+}
+#endif	/* CONFIG_COMPAT */
+
+static int vfio_group_open(struct inode *inode, struct file *filep)
+{
+	struct vfio_group *vgroup;
+	int ret = 0;
+
+	mutex_lock(&vfio.group_lock);
+
+	vgroup = idr_find(&vfio.idr, iminor(inode));
+
+	if (!vgroup) {
+		ret = -ENODEV;
+		goto out;
+	}
+
+	if (!vgroup->refcnt) {
+		struct vfio_container *vcontainer;
+		vcontainer = kzalloc(sizeof(*vcontainer), GFP_KERNEL);
+		if (!vcontainer) {
+			ret = -ENOMEM;
+			goto out;
+		}
+		vgroup->container = vcontainer;
+		vgroup->mm = current->mm;
+	} else if (current->mm != vgroup->mm) {
+		ret = -EBUSY;
+		goto out;
+	}
+	filep->private_data = vgroup;
+	vgroup->refcnt++;
+	vgroup->container->refcnt++;
+out:
+	mutex_unlock(&vfio.group_lock);
+
+	return ret;
+}
+
+static int vfio_group_release(struct inode *inode, struct file *filep)
+{
+	struct vfio_group *vgroup = filep->private_data;
+	struct vfio_container *vcontainer = vgroup->container;
+	struct list_head *pos;
+	int ret = 0;
+
+	mutex_lock(&vfio.group_lock);
+
+	if (vgroup->refcnt > 1) {
+		vgroup->refcnt--;
+		vcontainer->refcnt--;
+		goto out;
+	}
+
+	list_for_each(pos, &vgroup->device_list) {
+		struct vfio_device *vdev;
+		vdev = list_entry(pos, struct vfio_device, next);
+		if (vdev->refcnt) {
+			ret = -EBUSY;
+			goto out;
+		}
+	}
+
+	/* Merged group? */
+	if (vcontainer->refcnt > 1) {
+		if (vcontainer->iommu) {
+			list_for_each(pos, &vgroup->device_list) {
+				struct vfio_device *vdev;
+				vdev = list_entry(pos,
+						  struct vfio_device, next);
+				iommu_detach_device(vcontainer->iommu->domain,
+						    vdev->dev);
+				vdev->iommu = NULL;
+			}
+		}
+		vcontainer->refcnt--;
+		vfio_container_reset_read(vcontainer);
+	} else {
+		if (vcontainer->iommu && vcontainer->iommu->refcnt) {
+			ret = -EBUSY;
+			goto out;
+		}
+
+		ret = __vfio_close_iommu(vcontainer);
+		if (ret)
+			goto out;
+
+		kfree(vcontainer->read_buf);
+		kfree(vcontainer);
+	}
+
+	vgroup->refcnt--;
+	vgroup->mm = NULL;
+	vgroup->container = NULL;
+
+	/* Possible we had the group open while device members were removed */
+	if (list_empty(&vgroup->device_list)) {
+		device_destroy(vfio.class, vgroup->devt);
+		idr_remove(&vfio.idr, MINOR(vgroup->devt));
+		list_del(&vgroup->next);
+		kfree(vgroup);
+	}
+out:
+	mutex_unlock(&vfio.group_lock);
+	return 0;
+}
+
+static int __vfio_container_create_read_buf(struct vfio_container *vcontainer)
+{
+	struct list_head *gpos, *dpos;
+	struct vfio_group *vgroup;
+	struct vfio_device *vdev;
+	int off = 0;
+	char *buf;
+
+	buf = kzalloc(MAX_PATH, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	list_for_each(gpos, &vfio.group_list) {
+		vgroup = list_entry(gpos, struct vfio_group, next);
+		if (vgroup->container != vcontainer)
+			continue;
+
+		off += snprintf(buf + off, MAX_PATH,
+				"group: %u\n", vgroup->group);
+		buf = krealloc(buf, off + MAX_PATH, GFP_KERNEL);
+		if (!buf)
+			return -ENOMEM;
+		memset(buf + off, 0, MAX_PATH);
+
+		list_for_each(dpos, &vgroup->device_list) {
+			vdev = list_entry(dpos, struct vfio_device, next);
+
+			off += snprintf(buf + off, MAX_PATH,
+					"device: %s\n", dev_name(vdev->dev));
+			buf = krealloc(buf, off + MAX_PATH, GFP_KERNEL);
+			if (!buf)
+				return -ENOMEM;
+			memset(buf + off, 0, MAX_PATH);
+		}
+	}
+	buf = krealloc(buf, off + 1, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	vcontainer->read_buf = buf;
+	return 0;
+}
+
+static ssize_t vfio_group_read(struct file *filep, char __user *buf,
+			       size_t count, loff_t *ppos)
+{
+	struct vfio_group *vgroup = filep->private_data;
+	struct vfio_container *vcontainer;
+	ssize_t ret = 0;
+
+	mutex_lock(&vfio.group_lock);
+
+	vcontainer = vgroup->container;
+
+	if (!vcontainer) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (!vcontainer->read_buf) {
+		ret = __vfio_container_create_read_buf(vcontainer);
+		if (ret)
+			goto out;
+	}
+
+	if (*ppos >= strlen(vcontainer->read_buf) + 1) {
+		ret = 0;
+		goto out;
+	}
+
+	if (*ppos + count > strlen(vcontainer->read_buf) + 1)
+		count = strlen(vcontainer->read_buf) + 1 - *ppos;
+
+	if (copy_to_user(buf, vcontainer->read_buf + *ppos, count)) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	*ppos += count;
+	ret = count;
+out:
+	mutex_unlock(&vfio.group_lock);
+	return ret;
+}
+
+static const struct file_operations vfio_group_fops = {
+	.owner		= THIS_MODULE,
+	.open		= vfio_group_open,
+	.release	= vfio_group_release,
+	.read		= vfio_group_read,
+	.unlocked_ioctl	= vfio_group_unl_ioctl,
+#ifdef CONFIG_COMPAT
+	.compat_ioctl	= vfio_group_compat_ioctl,
+#endif
+};
+
+static void vfio_class_release(struct kref *kref)
+{
+	class_destroy(vfio.class);
+	vfio.class = NULL;
+}
+
+static char *vfio_devnode(struct device *dev, mode_t *mode)
+{
+	return kasprintf(GFP_KERNEL, "vfio/%s", dev_name(dev));
+}
+
+static int __init vfio_init(void)
+{
+	int ret;
+
+	idr_init(&vfio.idr);
+	mutex_init(&vfio.group_lock);
+	INIT_LIST_HEAD(&vfio.group_list);
+
+	kref_init(&vfio.kref);
+	vfio.class = class_create(THIS_MODULE, "vfio");
+	if (IS_ERR(vfio.class)) {
+		ret = PTR_ERR(vfio.class);
+		goto err_class;
+	}
+
+	vfio.class->devnode = vfio_devnode;
+
+	/* FIXME - how many minors to allocate... all of them! */
+	ret = alloc_chrdev_region(&vfio.devt, 0, MINORMASK, "vfio");
+	if (ret)
+		goto err_chrdev;
+
+	cdev_init(&vfio.cdev, &vfio_group_fops);
+	ret = cdev_add(&vfio.cdev, vfio.devt, MINORMASK);
+	if (ret)
+		goto err_cdev;
+
+	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
+
+	return 0;
+
+err_cdev:
+	unregister_chrdev_region(vfio.devt, MINORMASK);
+err_chrdev:
+	kref_put(&vfio.kref, vfio_class_release);
+err_class:
+	return ret;
+}
+
+static void __exit vfio_cleanup(void)
+{
+	struct list_head *gpos, *gppos;
+
+	list_for_each_safe(gpos, gppos, &vfio.group_list) {
+		struct vfio_group *vgroup;
+		struct list_head *dpos, *dppos;
+
+		vgroup = list_entry(gpos, struct vfio_group, next);
+
+		list_for_each_safe(dpos, dppos, &vgroup->device_list) {
+			struct vfio_device *vdev;
+
+			vdev = list_entry(dpos, struct vfio_device, next);
+			vfio_group_del_dev(vdev->dev);
+		}
+	}
+
+	idr_destroy(&vfio.idr);
+	cdev_del(&vfio.cdev);
+	unregister_chrdev_region(vfio.devt, MINORMASK);
+	kref_put(&vfio.kref, vfio_class_release);
+}
+
+module_init(vfio_init);
+module_exit(vfio_cleanup);
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
diff --git a/drivers/vfio/vfio_private.h b/drivers/vfio/vfio_private.h
new file mode 100644
index 0000000..2cc300c
--- /dev/null
+++ b/drivers/vfio/vfio_private.h
@@ -0,0 +1,82 @@
+/*
+ * Copyright (C) 2011 Red Hat, Inc.  All rights reserved.
+ *     Author: Alex Williamson <alex.williamson@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Derived from original vfio:
+ * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
+ * Author: Tom Lyon, pugs@cisco.com
+ */
+
+#include <linux/cdev.h>
+#include <linux/device.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/idr.h>
+#include <linux/iommu.h>
+#include <linux/list.h>
+#include <linux/mm.h>
+#include <linux/mutex.h>
+
+#ifndef VFIO_PRIVATE_H
+#define VFIO_PRIVATE_H
+
+extern const struct file_operations vfio_iommu_fops;
+extern const struct file_operations vfio_device_fops;
+
+struct vfio {
+	dev_t			devt;
+	struct cdev		cdev;
+	struct list_head	group_list;
+	struct mutex		group_lock;
+	struct kref		kref;
+	struct class		*class;
+	struct idr		idr;
+};
+
+struct vfio_device_ops {
+	struct vfio_device	*(* new)(struct device *);
+	void			(* free)(struct vfio_device *);
+	struct file_operations	fops;
+};
+
+struct vfio_iommu {
+	struct iommu_domain	*domain;
+	struct vfio		*vfio;
+	int			refcnt;
+	struct file		*file;
+};
+
+struct vfio_device {
+	struct device		*dev;
+	struct list_head	next;
+	struct file		*file;
+	struct vfio_device_ops	*ops;
+	struct vfio		*vfio;
+	struct vfio_iommu	*iommu;
+	int			refcnt;
+};
+
+struct vfio_container {
+	struct vfio_iommu	*iommu;
+	char			*read_buf;
+	int			refcnt;
+};
+
+struct vfio_group {
+	dev_t			devt;
+	unsigned int		group;
+	int			refcnt;
+	struct mm_struct	*mm;
+	struct vfio_container	*container;
+	struct list_head	device_list;
+	struct list_head	next;
+};
+
+extern int vfio_group_add_dev(struct device *dev, void *data);
+extern void vfio_group_del_dev(struct device *dev);
+
+#endif /* VFIO_PRIVATE_H */


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 4/5] VFIO: Add PCI device support
       [not found] <20110901194915.2391.97400.stgit@s20.home>
                   ` (2 preceding siblings ...)
  2011-09-01 19:50 ` [RFC PATCH 3/5] VFIO: Base framework for new VFIO driver Alex Williamson
@ 2011-09-01 19:50 ` Alex Williamson
  2011-09-01 19:50 ` [RFC PATCH 5/5] VFIO: Simple test tool Alex Williamson
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Alex Williamson @ 2011-09-01 19:50 UTC (permalink / raw)
  To: chrisw, aik, pmac, dwg, joerg.roedel, agraf, benve, aafabbri,
	B08248, B07421
  Cc: alex.williamson

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 drivers/vfio/Kconfig        |    7 ++
 drivers/vfio/Makefile       |    1 
 drivers/vfio/vfio_main.c    |   10 +++
 drivers/vfio/vfio_pci.c     |  124 +++++++++++++++++++++++++++++++++++++++++++
 drivers/vfio/vfio_private.h |    5 ++
 5 files changed, 147 insertions(+), 0 deletions(-)
 create mode 100644 drivers/vfio/vfio_pci.c

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index a150521..b17bdbd 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -3,3 +3,10 @@ menuconfig VFIO
 	depends on IOMMU_API
 	help
 	  If you don't know what to do here, say N.
+
+menuconfig VFIO_PCI
+	bool "VFIO support for PCI devices"
+	depends on VFIO && PCI
+	default y if X86
+	help
+	  If you don't know what to do here, say N.
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index 5eaa074..90ee753 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -1,3 +1,4 @@
 obj-$(CONFIG_VFIO) := vfio.o
 
 vfio-y := vfio_main.o vfio_iommu.o vfio_device.o
+vfio-$(CONFIG_VFIO_PCI) += vfio_pci.o
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 7f05692..c6e80f7 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -834,6 +834,12 @@ static int __init vfio_init(void)
 	if (ret)
 		goto err_cdev;
 
+#ifdef CONFIG_VFIO_PCI
+	ret = vfio_pci_init(&vfio);
+	if (ret)
+		pr_debug(DRIVER_DESC "PCI init failed %d\n", ret);
+#endif
+
 	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
 
 	return 0;
@@ -864,6 +870,10 @@ static void __exit vfio_cleanup(void)
 		}
 	}
 
+#ifdef CONFIG_VFIO_PCI
+	vfio_pci_cleanup(&vfio);
+#endif
+
 	idr_destroy(&vfio.idr);
 	cdev_del(&vfio.cdev);
 	unregister_chrdev_region(vfio.devt, MINORMASK);
diff --git a/drivers/vfio/vfio_pci.c b/drivers/vfio/vfio_pci.c
new file mode 100644
index 0000000..88325d0
--- /dev/null
+++ b/drivers/vfio/vfio_pci.c
@@ -0,0 +1,124 @@
+/*
+ * Copyright (C) 2011 Red Hat, Inc.  All rights reserved.
+ *     Author: Alex Williamson <alex.williamson@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Derived from original vfio:
+ * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
+ * Author: Tom Lyon, pugs@cisco.com
+ */
+
+#include <linux/device.h>
+#include <linux/notifier.h>
+#include <linux/pci.h>
+#include <linux/slab.h>
+#include <linux/vfio.h>
+
+#include "vfio_private.h"
+
+struct vfio_pci_device {
+	struct vfio_device	vdev;
+	struct pci_dev		*pdev;
+};
+
+static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+	return 0;
+}
+
+static void vfio_pci_remove(struct pci_dev *pdev)
+{
+}
+
+static struct pci_driver vfio_pci_driver = {
+	.name		= "vfio",
+	.id_table	= NULL, /* only dynamic id's */
+	.probe		= vfio_pci_probe,
+	.remove		= vfio_pci_remove,
+};
+
+static struct vfio_device *vfio_pci_new(struct device *dev)
+{
+	struct vfio_pci_device *pvdev;
+
+	pvdev = kzalloc(sizeof(*pvdev), GFP_KERNEL);
+	if (!pvdev)
+		return ERR_PTR(-ENOMEM);
+
+	printk("%s: alloc pvdev @%p\n", __FUNCTION__, pvdev);
+	pvdev->pdev = container_of(dev, struct pci_dev, dev);
+
+	// PCI stuff...
+
+	return &pvdev->vdev;
+}
+
+static void vfio_pci_free(struct vfio_device *vdev)
+{
+	struct vfio_pci_device *pvdev;
+
+	pvdev = container_of(vdev, struct vfio_pci_device, vdev);
+
+	// PCI stuff...
+
+	printk("%s: freeing pvdev @%p\n", __FUNCTION__, pvdev);
+	kfree(pvdev);
+}
+
+static const struct vfio_device_ops vfio_pci_ops = {
+	.new	= vfio_pci_new,
+	.free	= vfio_pci_free,
+};
+
+static int vfio_pci_device_notifier(struct notifier_block *nb,
+				    unsigned long action, void *data)
+{
+        struct device *dev = data;
+	struct pci_dev *pdev = container_of(dev, struct pci_dev, dev);
+
+        if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
+		return 0;
+
+        if (action == BUS_NOTIFY_ADD_DEVICE)
+                return vfio_group_add_dev(dev, (void *)&vfio_pci_ops);
+        else if (action == BUS_NOTIFY_DEL_DEVICE)
+                vfio_group_del_dev(dev);
+        return 0;
+}
+
+static int vfio_pci_add_dev(struct device *dev, void *unused)
+{
+	struct pci_dev *pdev = container_of(dev, struct pci_dev, dev);
+
+        if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
+		return 0;
+
+	return vfio_group_add_dev(dev, (void *)&vfio_pci_ops);
+}
+
+static struct notifier_block vfio_pci_device_nb = {
+        .notifier_call = vfio_pci_device_notifier,
+};
+
+void __exit vfio_pci_cleanup(struct vfio *vfio)
+{
+	bus_unregister_notifier(&pci_bus_type, &vfio_pci_device_nb);
+	pci_unregister_driver(&vfio_pci_driver);
+}
+
+int __init vfio_pci_init(struct vfio *vfio)
+{
+	int ret;
+
+	ret = pci_register_driver(&vfio_pci_driver);
+	if (ret)
+		return ret;
+
+	bus_register_notifier(&pci_bus_type, &vfio_pci_device_nb);
+	bus_for_each_dev(&pci_bus_type, NULL, NULL, vfio_pci_add_dev);
+
+	return 0;
+}
diff --git a/drivers/vfio/vfio_private.h b/drivers/vfio/vfio_private.h
index 2cc300c..85c88ea 100644
--- a/drivers/vfio/vfio_private.h
+++ b/drivers/vfio/vfio_private.h
@@ -79,4 +79,9 @@ struct vfio_group {
 extern int vfio_group_add_dev(struct device *dev, void *data);
 extern void vfio_group_del_dev(struct device *dev);
 
+#ifdef CONFIG_VFIO_PCI
+extern int vfio_pci_init(struct vfio *vfio);
+extern void vfio_pci_cleanup(struct vfio *vfio);
+#endif
+
 #endif /* VFIO_PRIVATE_H */


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 5/5] VFIO: Simple test tool
       [not found] <20110901194915.2391.97400.stgit@s20.home>
                   ` (3 preceding siblings ...)
  2011-09-01 19:50 ` [RFC PATCH 4/5] VFIO: Add PCI device support Alex Williamson
@ 2011-09-01 19:50 ` Alex Williamson
  2011-09-07 11:58 ` [RFC PATCH 0/5] VFIO-NG group/device/iommu framework Alexander Graf
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Alex Williamson @ 2011-09-01 19:50 UTC (permalink / raw)
  To: chrisw, aik, pmac, dwg, joerg.roedel, agraf, benve, aafabbri,
	B08248, B07421
  Cc: alex.williamson

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 tools/testing/vfio/Makefile    |    4 
 tools/testing/vfio/vfio_test.c |  406 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 410 insertions(+), 0 deletions(-)
 create mode 100644 tools/testing/vfio/Makefile
 create mode 100644 tools/testing/vfio/vfio_test.c

diff --git a/tools/testing/vfio/Makefile b/tools/testing/vfio/Makefile
new file mode 100644
index 0000000..df1fa68
--- /dev/null
+++ b/tools/testing/vfio/Makefile
@@ -0,0 +1,4 @@
+vfio_test : vfio_test.c
+
+clean :
+	rm -f vfio_test
diff --git a/tools/testing/vfio/vfio_test.c b/tools/testing/vfio/vfio_test.c
new file mode 100644
index 0000000..66eef81
--- /dev/null
+++ b/tools/testing/vfio/vfio_test.c
@@ -0,0 +1,406 @@
+/*
+ * Simple user test program for vfio group/device/iommu framework
+ *
+ * Copyright (C) 2011 Red Hat, Inc.  All rights reserved.
+ * 	Author: Alex Williamson <alex.williamson@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <errno.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/ioctl.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "../../../include/linux/vfio.h"
+
+struct group {
+	int fd;
+	unsigned int number;
+	struct group *next;
+};
+
+struct group *group_list = NULL;
+
+struct device {
+	int fd;
+	char *name;
+	struct device *next;
+};
+
+struct device *device_list = NULL;
+
+struct iommu {
+	int fd;
+	struct device *next;
+};
+
+struct iommu *iommu_list = NULL;
+
+void print_group(unsigned int number)
+{
+	struct group *group = group_list;
+	char buf[4096];
+	int ret;
+
+	for (; group && group->number != number; group = group->next);
+
+	if (!group) {
+		fprintf(stderr, "Group %u not found\n", number);
+	} else {
+		ret = pread(group->fd, buf, sizeof(buf), 0);
+		if (ret < 0) {
+			fprintf(stderr, "Error reading group %u (%s)\n",
+				group, strerror(errno));
+			return;
+		}
+		fprintf(stdout, "---- Group %u (fd %d) begin ----\n",
+			number, group->fd);
+		fprintf(stdout, "%s", buf);
+		fprintf(stdout, "---- Group %u end ----\n", number);
+	}
+}
+
+void print_device(struct device *device)
+{
+	fprintf(stdout, "---- Device %s (fd %d) ----\n",
+		device->name, device->fd);
+}
+
+int do_device()
+{
+	char cmd[256];
+	int ret;
+
+	while (1) {
+		fprintf(stdout, "device command: ");
+		fscanf(stdin, "%s", cmd);
+
+		if (!strcmp(cmd, "quit") || !strcmp(cmd, "exit") ||
+		    !strcmp(cmd, "q"))
+			return 0;
+
+		if (!strcmp(cmd, "help") || !strcmp(cmd, "h")) {
+			fprintf(stdout, "[h]elp - this message\n");
+			fprintf(stdout, "[o]pen - open device\n");
+			fprintf(stdout, "[c]lose - close device\n");
+			fprintf(stdout, "[l]ist - list devices\n");
+
+		} else if (!strcmp(cmd, "open") || !strcmp(cmd, "o")) {
+			int fd;
+			struct device *device;
+
+			fprintf(stdout, "group fd #: ");
+			fscanf(stdin, "%d", &fd);
+
+			fprintf(stdout, "device name: ");
+			fscanf(stdin, "%s", cmd);
+
+			ret = ioctl(fd, VFIO_GROUP_GET_DEVICE_FD, cmd);
+			if (ret < 0) {
+				fprintf(stderr, "get device failed (%s)\n",
+					strerror(errno));
+				return ret;
+			}
+
+			device = malloc(sizeof(*device));
+			if (!device) {
+				fprintf(stderr, "malloc device failed (%s)\n",
+					strerror(errno));
+				return -1;
+			}
+
+			device->fd = ret;
+			device->name = strdup(cmd);
+			device->next = device_list;
+			device_list = device;
+			print_device(device);
+			return 0;
+
+		} else if (!strcmp(cmd, "close") || !strcmp(cmd, "c")) {
+			struct device *device;
+
+			fprintf(stdout, "device name: ");
+			fscanf(stdin, "%s", cmd);
+
+			for (device = device_list;
+			     device && strcmp(device->name, cmd);
+			     device = device->next);
+
+			if (!device) {
+				fprintf(stderr, "device not found\n");
+				return 0;
+			}
+
+			ret = close(device->fd);
+			if (ret) {
+				fprintf(stderr, "Error closing device (%s)\n",
+					strerror(errno));
+				return ret;
+			}
+			
+			if (device == device_list)
+				device_list = device->next;
+			else {
+				struct device *prev;
+
+				for (prev = device_list; prev->next != device;
+				     prev = prev->next);
+
+				prev->next = device->next;
+			}
+			free(device->name);
+			free(device);
+			return 0;
+
+		} else if (!strcmp(cmd, "list") || !strcmp(cmd, "l")) {
+			struct device *device;
+
+			for (device = device_list;
+			     device; device = device->next)
+				print_device(device);
+
+			return 0;
+		}
+	}
+	return 0;
+}
+
+void do_iommu()
+{
+
+}
+
+int main(int argc, char **argv)
+{
+	char cmd[256];
+	int ret;
+
+	while (1) {
+		fprintf(stdout, "command: ");
+		fscanf(stdin, "%s", cmd);
+
+		if (!strcmp(cmd, "quit") || !strcmp(cmd, "exit") ||
+		    !strcmp(cmd, "q"))
+			return 0;
+
+		if (!strcmp(cmd, "help") || !strcmp(cmd, "h")) {
+			fprintf(stdout, "[h]elp - this message\n");
+			fprintf(stdout, "[p]rint - print group\n");
+			fprintf(stdout, "[o]pen - open group\n");
+			fprintf(stdout, "[c]lose - close group\n");
+			fprintf(stdout, "close[f]d - close fd\n");
+			fprintf(stdout, "[m]erge - merge group\n");
+			fprintf(stdout, "[u]nmerge - unmerge group\n");
+			fprintf(stdout, "[d]evice - device commands\n");
+			fprintf(stdout, "[i]ommu - iommu commands\n");
+			fprintf(stdout, "[l]ist - list groups\n");
+
+		} else if (!strcmp(cmd, "print") || !strcmp(cmd, "p")) {
+			unsigned int number;
+
+			fprintf(stdout, "group #: ");
+			fscanf(stdin, "%u", &number);
+
+			print_group(number);
+
+		} else if (!strcmp(cmd, "device") || !strcmp(cmd, "d")) {
+			do_device();
+
+		} else if (!strcmp(cmd, "iommu") || !strcmp(cmd, "i")) {
+			do_iommu();
+
+		} else if (!strcmp(cmd, "list") || !strcmp(cmd, "l")) {
+			struct group *group;
+
+			for (group = group_list; group; group = group->next)
+				print_group(group->number);
+
+		} else if (!strcmp(cmd, "open") || !strcmp(cmd, "o")) {
+			unsigned int number;
+			struct group *group;
+			char path[256];
+
+			fprintf(stdout, "group #: ");
+			fscanf(stdin, "%u", &number);
+
+			group = malloc(sizeof(*group));
+			if (!group) {
+				fprintf(stderr, "Failed to alloc group\n");
+				return -1;
+			}
+
+			snprintf(path, sizeof(path), "/dev/vfio/%u", number);
+			group->fd = open(path, O_RDWR);
+			if (group->fd < 0) {
+				fprintf(stderr, "Failed to open %s (%s)\n",
+					path, strerror(errno));
+				free(group);
+				continue;
+			}
+			group->number = number;
+			group->next = group_list;
+			group_list = group;
+
+			print_group(number);
+
+		} else if (!strcmp(cmd, "close") || !strcmp(cmd, "c")) {
+			unsigned int number;
+			struct group *group;
+			int ret;
+
+			fprintf(stdout, "group #: ");
+			fscanf(stdin, "%u", &number);
+
+			for (group = group_list;
+			     group && group->number != number;
+			     group = group->next);
+
+			if (!group) {
+				fprintf(stderr, "group not open, open first\n");
+				continue;
+			}
+
+			ret = close(group->fd);
+			if (ret) {
+				fprintf(stderr, "close failed (%s)\n",
+					strerror(errno));
+				continue;
+			}
+
+			if (group == group_list)
+				group_list = group->next;
+			else {
+				struct group *prev;
+
+				for (prev = group_list; prev->next != group;
+				     prev = prev->next);
+
+				prev->next = group->next;
+			}
+			free(group);
+
+		} else if (!strcmp(cmd, "closefd") || !strcmp(cmd, "f")) {
+			int fd;
+			struct group *group;
+			int ret;
+
+			fprintf(stdout, "fd #: ");
+			fscanf(stdin, "%d", &fd);
+
+			for (group = group_list;
+			     group && group->fd != fd;
+			     group = group->next);
+
+			if (!group) {
+				fprintf(stderr, "group not open, open first\n");
+				continue;
+			}
+
+			ret = close(group->fd);
+			if (ret) {
+				fprintf(stderr, "close failed (%s)\n",
+					strerror(errno));
+				continue;
+			}
+
+			if (group == group_list)
+				group_list = group->next;
+			else {
+				struct group *prev;
+
+				for (prev = group_list; prev->next != group;
+				     prev = prev->next);
+
+				prev->next = group->next;
+			}
+			free(group);
+
+		} else if (!strcmp(cmd, "merge") || !strcmp(cmd, "m")) {
+			unsigned int numberA, numberB;
+			struct group *groupA, *groupB;
+			int ret;
+
+			fprintf(stdout, "base group #: ");
+			fscanf(stdin, "%u", &numberA);
+
+			for (groupA = group_list;
+			     groupA && groupA->number != numberA;
+			     groupA = groupA->next);
+
+			if (!groupA) {
+				fprintf(stderr, "group not open, open first\n");
+				continue;
+			}
+
+			fprintf(stdout, "merge group #: ");
+			fscanf(stdin, "%u", &numberB);
+
+			for (groupB = group_list;
+			     groupB && groupB->number != numberB;
+			     groupB = groupB->next);
+
+			if (!groupB) {
+				fprintf(stderr, "group not open, open first\n");
+				continue;
+			}
+
+			ret = ioctl(groupA->fd, VFIO_GROUP_MERGE, &groupB->fd);
+			if (ret) {
+				fprintf(stderr, "group merge failed (%s)\n",
+					strerror(errno));
+					continue;
+			}
+
+			print_group(numberA);
+			print_group(numberB);
+
+		} else if (!strcmp(cmd, "unmerge") || !strcmp(cmd, "u")) {
+			unsigned int numberA, numberB;
+			struct group *groupA, *groupB;
+			int ret;
+
+			fprintf(stdout, "base group #: ");
+			fscanf(stdin, "%u", &numberA);
+
+			for (groupA = group_list;
+			     groupA && groupA->number != numberA;
+			     groupA = groupA->next);
+
+			if (!groupA) {
+				fprintf(stderr, "group not open, open first\n");
+				continue;
+			}
+
+			fprintf(stdout, "unmerge group #: ");
+			fscanf(stdin, "%u", &numberB);
+
+			for (groupB = group_list;
+			     groupB && groupB->number != numberB;
+			     groupB = groupB->next);
+
+			if (!groupB) {
+				fprintf(stderr, "group not open, open first\n");
+				continue;
+			}
+
+			ret = ioctl(groupA->fd,
+				    VFIO_GROUP_UNMERGE, &groupB->fd);
+			if (ret) {
+				fprintf(stderr, "group unmerge failed (%s)\n",
+					strerror(errno));
+					continue;
+			}
+
+			print_group(numberA);
+			print_group(numberB);
+		}
+	}
+}


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 0/5] VFIO-NG group/device/iommu framework
       [not found] <20110901194915.2391.97400.stgit@s20.home>
                   ` (4 preceding siblings ...)
  2011-09-01 19:50 ` [RFC PATCH 5/5] VFIO: Simple test tool Alex Williamson
@ 2011-09-07 11:58 ` Alexander Graf
  2011-09-08 21:54   ` Alex Williamson
       [not found] ` <20110901195043.2391.31843.stgit@s20.home>
       [not found] ` <20110901195050.2391.12028.stgit@s20.home>
  7 siblings, 1 reply; 12+ messages in thread
From: Alexander Graf @ 2011-09-07 11:58 UTC (permalink / raw)
  To: Alex Williamson
  Cc: chrisw, aik, pmac, dwg, joerg.roedel, benve, aafabbri, B08248,
	B07421, avi, kvm, qemu-devel, iommu, linux-pci


On 01.09.2011, at 21:50, Alex Williamson wrote:

> Trying to move beyond talking about how VFIO should work to
> re-writing the code.  This is pre-alpha, known broken, will
> probably crash your system but it illustrates some of how
> I see groups, devices, and iommus interacting.  This is just
> the framework, no code to actually support user space drivers
> or device assignment yet.
> 
> The iommu portions are still using the "FIXME" PCI specific
> hooks.  Once Joerg gets some buy-in on his bus specific iommu
> patches, we can move to that.
> 
> The group management is more complicated than I'd like and
> you can get groups into a bad state by killing the test program
> with devices/iommus open.  The locking is overly simplistic.
> But, it's a start.  Please make constructive comments and
> suggestions.  Patches based on v3.0.  Thanks,

Looks pretty reasonable to me so far, but I guess we only know for sure once we have non-PCI implemented and working with this scheme as well.
Btw I couldn't find the PCI BAR regions mmaps and general config space exposure. Where has that gone?


Alex

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 3/5] VFIO: Base framework for new VFIO driver
       [not found] ` <20110901195043.2391.31843.stgit@s20.home>
@ 2011-09-07 14:52   ` Konrad Rzeszutek Wilk
  2011-09-19 16:42     ` Alex Williamson
  0 siblings, 1 reply; 12+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-09-07 14:52 UTC (permalink / raw)
  To: Alex Williamson
  Cc: aafabbri, aik, kvm, pmac, qemu-devel, joerg.roedel, agraf, dwg,
	chrisw, B08248, iommu, avi, linux-pci, B07421, benve

> +static long vfio_iommu_unl_ioctl(struct file *filep,
> +				 unsigned int cmd, unsigned long arg)
> +{
> +	struct vfio_iommu *viommu = filep->private_data;
> +	struct vfio_dma_map dm;
> +	int ret = -ENOSYS;
> +
> +	switch (cmd) {
> +	case VFIO_IOMMU_MAP_DMA:
> +		if (copy_from_user(&dm, (void __user *)arg, sizeof dm))
> +			return -EFAULT;
> +		ret = 0; // XXX - Do something

<chuckles>

> +		if (!ret && copy_to_user((void __user *)arg, &dm, sizeof dm))
> +			ret = -EFAULT;
> +		break;
> +
> +	case VFIO_IOMMU_UNMAP_DMA:
> +		if (copy_from_user(&dm, (void __user *)arg, sizeof dm))
> +			return -EFAULT;
> +		ret = 0; // XXX - Do something
> +		if (!ret && copy_to_user((void __user *)arg, &dm, sizeof dm))
> +			ret = -EFAULT;
> +		break;
> +	}
> +	return ret;
> +}
> +
> +#ifdef CONFIG_COMPAT
> +static long vfio_iommu_compat_ioctl(struct file *filep,
> +				    unsigned int cmd, unsigned long arg)
> +{
> +	arg = (unsigned long)compat_ptr(arg);
> +	return vfio_iommu_unl_ioctl(filep, cmd, arg);
> +}
> +#endif	/* CONFIG_COMPAT */
> +
> +const struct file_operations vfio_iommu_fops = {
> +	.owner		= THIS_MODULE,
> +	.release	= vfio_iommu_release,
> +	.unlocked_ioctl	= vfio_iommu_unl_ioctl,
> +#ifdef CONFIG_COMPAT
> +	.compat_ioctl	= vfio_iommu_compat_ioctl,
> +#endif
> +};
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
.. snip..
> +int vfio_group_add_dev(struct device *dev, void *data)
> +{
> +	struct vfio_device_ops *ops = data;
> +	struct list_head *pos;
> +	struct vfio_group *vgroup = NULL;
> +	struct vfio_device *vdev = NULL;
> +	unsigned int group;
> +	int ret = 0, new_group = 0;

'new_group' should probably be 'bool'.

> +
> +	if (iommu_device_group(dev, &group))
> +		return 0;

-EEXIST?

> +
> +	mutex_lock(&vfio.group_lock);
> +
> +	list_for_each(pos, &vfio.group_list) {
> +		vgroup = list_entry(pos, struct vfio_group, next);
> +		if (vgroup->group == group)
> +			break;
> +		vgroup = NULL;
> +	}
> +
> +	if (!vgroup) {
> +		int id;
> +
> +		if (unlikely(idr_pre_get(&vfio.idr, GFP_KERNEL) == 0)) {
> +			ret = -ENOMEM;
> +			goto out;
> +		}
> +		vgroup = kzalloc(sizeof(*vgroup), GFP_KERNEL);
> +		if (!vgroup) {
> +			ret = -ENOMEM;
> +			goto out;
> +		}
> +
> +		vgroup->group = group;
> +		INIT_LIST_HEAD(&vgroup->device_list);
> +
> +		ret = idr_get_new(&vfio.idr, vgroup, &id);
> +		if (ret == 0 && id > MINORMASK) {
> +			idr_remove(&vfio.idr, id);
> +			kfree(vgroup);
> +			ret = -ENOSPC;
> +			goto out;
> +		}
> +
> +		vgroup->devt = MKDEV(MAJOR(vfio.devt), id);
> +		list_add(&vgroup->next, &vfio.group_list);
> +		device_create(vfio.class, NULL, vgroup->devt,
> +			      vgroup, "%u", group);
> +
> +		new_group = 1;
> +	} else {
> +		list_for_each(pos, &vgroup->device_list) {
> +			vdev = list_entry(pos, struct vfio_device, next);
> +			if (vdev->dev == dev)
> +				break;
> +			vdev = NULL;
> +		}
> +	}
> +
> +	if (!vdev) {
> +		/* Adding a device for a group that's already in use? */
> +		/* Maybe we should attach to the domain so others can't */
> +		BUG_ON(vgroup->container &&
> +		       vgroup->container->iommu &&
> +		       vgroup->container->iommu->refcnt);
> +
> +		vdev = ops->new(dev);
> +		if (IS_ERR(vdev)) {
> +			/* If we just created this vgroup, tear it down */
> +			if (new_group) {
> +				device_destroy(vfio.class, vgroup->devt);
> +				idr_remove(&vfio.idr, MINOR(vgroup->devt));
> +				list_del(&vgroup->next);
> +				kfree(vgroup);
> +			}
> +			ret = PTR_ERR(vdev);
> +			goto out;
> +		}
> +		list_add(&vdev->next, &vgroup->device_list);
> +		vdev->dev = dev;
> +		vdev->ops = ops;
> +		vdev->vfio = &vfio;
> +	}
> +out:
> +	mutex_unlock(&vfio.group_lock);
> +	return ret;
> +}
> +
> +void vfio_group_del_dev(struct device *dev)
> +{
> +	struct list_head *pos;
> +	struct vfio_container *vcontainer;
> +	struct vfio_group *vgroup = NULL;
> +	struct vfio_device *vdev = NULL;
> +	unsigned int group;
> +
> +	if (iommu_device_group(dev, &group))
> +		return;
> +
> +	mutex_lock(&vfio.group_lock);
> +
> +	list_for_each(pos, &vfio.group_list) {
> +		vgroup = list_entry(pos, struct vfio_group, next);
> +		if (vgroup->group == group)
> +			break;
> +		vgroup = NULL;
> +	}
> +
> +	if (!vgroup)
> +		goto out;
> +
> +	vcontainer = vgroup->container;
> +
> +	list_for_each(pos, &vgroup->device_list) {
> +		vdev = list_entry(pos, struct vfio_device, next);
> +		if (vdev->dev == dev)
> +			break;
> +		vdev = NULL;
> +	}
> +
> +	if (!vdev)
> +		goto out;
> +
> +	/* XXX Did a device we're using go away? */
> +	BUG_ON(vdev->refcnt);
> +
> +	if (vcontainer && vcontainer->iommu) {
> +		iommu_detach_device(vcontainer->iommu->domain, vdev->dev);
> +		vfio_container_reset_read(vcontainer);
> +	}
> +
> +	list_del(&vdev->next);
> +	vdev->ops->free(vdev);
> +
> +	if (list_empty(&vgroup->device_list) && vgroup->refcnt == 0) {
> +		device_destroy(vfio.class, vgroup->devt);
> +		idr_remove(&vfio.idr, MINOR(vgroup->devt));
> +		list_del(&vgroup->next);
> +		kfree(vgroup);
> +	}
> +out:
> +	mutex_unlock(&vfio.group_lock);
> +}
> +
> +static int __vfio_group_viable(struct vfio_container *vcontainer)

Just return 'bool'

> +{
> +	struct list_head *gpos, *dpos;
> +
> +	list_for_each(gpos, &vfio.group_list) {
> +		struct vfio_group *vgroup;
> +		vgroup = list_entry(gpos, struct vfio_group, next);
> +		if (vgroup->container != vcontainer)
> +			continue;
> +
> +		list_for_each(dpos, &vgroup->device_list) {
> +			struct vfio_device *vdev;
> +			vdev = list_entry(dpos, struct vfio_device, next);
> +
> +			if (!vdev->dev->driver ||
> +			    vdev->dev->driver->owner != THIS_MODULE)
> +				return 0;
> +		}
> +	}
> +	return 1;
> +}
> +
> +static int __vfio_close_iommu(struct vfio_container *vcontainer)
> +{
> +	struct list_head *gpos, *dpos;
> +	struct vfio_iommu *viommu = vcontainer->iommu;
> +	struct vfio_group *vgroup;
> +	struct vfio_device *vdev;
> +
> +	if (!viommu)
> +		return 0;
> +
> +	if (viommu->refcnt)
> +		return -EBUSY;
> +
> +	list_for_each(gpos, &vfio.group_list) {
> +		vgroup = list_entry(gpos, struct vfio_group, next);
> +		if (vgroup->container != vcontainer)
> +			continue;
> +
> +		list_for_each(dpos, &vgroup->device_list) {
> +			vdev = list_entry(dpos, struct vfio_device, next);
> +			iommu_detach_device(viommu->domain, vdev->dev);
> +			vdev->iommu = NULL;
> +		}
> +	}
> +	iommu_domain_free(viommu->domain);
> +	kfree(viommu);
> +	vcontainer->iommu = NULL;
> +	return 0;
> +}
> +
> +static int __vfio_open_iommu(struct vfio_container *vcontainer)
> +{
> +	struct list_head *gpos, *dpos;
> +	struct vfio_iommu *viommu;
> +	struct vfio_group *vgroup;
> +	struct vfio_device *vdev;
> +
> +	if (!__vfio_group_viable(vcontainer))
> +		return -EBUSY;
> +
> +	viommu = kzalloc(sizeof(*viommu), GFP_KERNEL);
> +	if (!viommu)
> +		return -ENOMEM;
> +
> +	viommu->domain = iommu_domain_alloc();
> +	if (!viommu->domain) {
> +		kfree(viommu);
> +		return -EFAULT;
> +	}
> +
> +	viommu->vfio = &vfio;
> +	vcontainer->iommu = viommu;
> +

No need for
  mutex_lock(&vfio.group_lock);

Ah, you already hold the lock when using this function.

> +	list_for_each(gpos, &vfio.group_list) {
> +		vgroup = list_entry(gpos, struct vfio_group, next);
> +		if (vgroup->container != vcontainer)
> +			continue;
> +
> +		list_for_each(dpos, &vgroup->device_list) {
> +			int ret;
> +
> +			vdev = list_entry(dpos, struct vfio_device, next);
> +
> +			ret = iommu_attach_device(viommu->domain, vdev->dev);
> +			if (ret) {
> +				__vfio_close_iommu(vcontainer);
> +				return ret;
> +			}
> +			vdev->iommu = viommu;
> +		}
> +	}
> +
> +	if (!allow_unsafe_intrs &&
> +	    !iommu_domain_has_cap(viommu->domain, IOMMU_CAP_INTR_REMAP)) {
> +		__vfio_close_iommu(vcontainer);
> +		return -EFAULT;
> +	}
> +
> +	return 0;
> +}
> +
> +static int vfio_group_merge(struct vfio_group *vgroup, int fd)
> +{
> +	struct vfio_group *vgroup2;
> +	struct iommu_domain *domain;
> +	struct list_head *pos;
> +	struct file *file;
> +	int ret = 0;
> +
> +	mutex_lock(&vfio.group_lock);
> +
> +	file = fget(fd);
> +	if (!file) {
> +		ret = -EBADF;
> +		goto out_noput;
> +	}
> +	if (file->f_op != &vfio_group_fops) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	vgroup2 = file->private_data;
> +	if (!vgroup2 || vgroup2 == vgroup || vgroup2->mm != vgroup->mm ||
> +	    (vgroup2->container->iommu && vgroup2->container->iommu->refcnt)) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	if (!vgroup->container->iommu) {
> +		ret = __vfio_open_iommu(vgroup->container);
> +		if (ret)
> +			goto out;
> +	}
> +
> +	if (!vgroup2->container->iommu) {
> +		ret = __vfio_open_iommu(vgroup2->container);
> +		if (ret)
> +			goto out;
> +	}
> +
> +	if (iommu_domain_has_cap(vgroup->container->iommu->domain,
> +				 IOMMU_CAP_CACHE_COHERENCY) !=
> +	    iommu_domain_has_cap(vgroup2->container->iommu->domain,
> +				 IOMMU_CAP_CACHE_COHERENCY)) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	ret = __vfio_close_iommu(vgroup2->container);
> +	if (ret)
> +		goto out;
> +
> +	domain = vgroup->container->iommu->domain;
> +
> +	list_for_each(pos, &vgroup2->device_list) {
> +		struct vfio_device *vdev;
> +
> +		vdev = list_entry(pos, struct vfio_device, next);
> +
> +		ret = iommu_attach_device(domain, vdev->dev);
> +		if (ret) {
> +			list_for_each(pos, &vgroup2->device_list) {
> +				struct vfio_device *vdev2;
> +
> +				vdev2 = list_entry(pos,
> +						   struct vfio_device, next);
> +				if (vdev2 == vdev)
> +					break;
> +
> +				iommu_detach_device(domain, vdev2->dev);
> +				vdev2->iommu = NULL;
> +			}
> +			goto out;
> +		}
> +		vdev->iommu = vgroup->container->iommu;
> +	}
> +
> +	kfree(vgroup2->container->read_buf);
> +	kfree(vgroup2->container);
> +
> +	vgroup2->container = vgroup->container;
> +	vgroup->container->refcnt++;
> +	vfio_container_reset_read(vgroup->container);
> +
> +out:
> +	fput(file);
> +out_noput:
> +	mutex_unlock(&vfio.group_lock);
> +	return ret;
> +}
> +
> +static int vfio_group_unmerge(struct vfio_group *vgroup, int fd)
> +{
> +	struct vfio_group *vgroup2;
> +	struct vfio_container *vcontainer2;
> +	struct vfio_device *vdev;
> +	struct list_head *pos;
> +	struct file *file;
> +	int ret = 0;
> +
> +	vcontainer2 = kzalloc(sizeof(*vcontainer2), GFP_KERNEL);
> +	if (!vcontainer2)
> +		return -ENOMEM;
> +
> +	mutex_lock(&vfio.group_lock);
> +
> +	file = fget(fd);
> +	if (!file) {
> +		ret = -EBADF;
> +		goto out_noput;
> +	}
> +	if (file->f_op != &vfio_group_fops) {

Hm, I think scripts/checkpath.pl will not like that, but as
you said - it is RFC.

> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	vgroup2 = file->private_data;
> +	if (!vgroup2 || vgroup2 == vgroup ||
> +	    vgroup2->container != vgroup->container) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	list_for_each(pos, &vgroup2->device_list) {
> +		vdev = list_entry(pos, struct vfio_device, next);
> +		if (vdev->refcnt) {
> +			ret = -EBUSY;
> +			goto out;
> +		}
> +	}
> +
> +	list_for_each(pos, &vgroup2->device_list) {
> +		vdev = list_entry(pos, struct vfio_device, next);
> +		iommu_detach_device(vgroup->container->iommu->domain,
> +				    vdev->dev);
> +		vdev->iommu = NULL;
> +	}
> +
> +	vgroup2->container = vcontainer2;
> +	vcontainer2->refcnt++;
> +	vgroup->container->refcnt--;
> +	vfio_container_reset_read(vgroup->container);
> +out:
> +	fput(file);
> +out_noput:
> +	if (ret)
> +		kfree(vcontainer2);
> +	mutex_unlock(&vfio.group_lock);
> +	return ret;
> +}
> +
> +static int vfio_group_get_iommu_fd(struct vfio_group *vgroup)
> +{
> +	int ret = 0;
> +	struct vfio_iommu *viommu;
> +
> +	mutex_lock(&vfio.group_lock);
> +
> +	if (!vgroup->container->iommu) {
> +		ret = __vfio_open_iommu(vgroup->container);
> +		if (ret)
> +			goto out;
> +	}
> +
> +	viommu = vgroup->container->iommu;
> +
> +	if (!viommu->file) {
> +		viommu->file = anon_inode_getfile("vfio-iommu",
> +						  &vfio_iommu_fops,
> +						  viommu, O_RDWR);
> +		if (IS_ERR(viommu->file)) {
> +			ret = PTR_ERR(viommu->file);
> +			viommu->file = NULL;
> +			goto out;
> +		}
> +	}
> +	ret = get_unused_fd();
> +	if (ret < 0)
> +		goto out;
> +
> +	fd_install(ret, viommu->file);
> +
> +	vgroup->container->iommu->refcnt++;
> +out:
> +	mutex_unlock(&vfio.group_lock);
> +	return ret;
> +}
> +
> +static int vfio_group_get_device_fd(struct vfio_group *vgroup, char *buf)
> +{
> +	struct vfio_container *vcontainer = vgroup->container;
> +	struct list_head *gpos, *dpos;
> +	int ret = -ENODEV;
> +
> +	mutex_lock(&vfio.group_lock);
> +
> +	if (!vcontainer->iommu) {
> +		ret = __vfio_open_iommu(vcontainer);
> +		if (ret)
> +			goto out;
> +	}
> +
> +	list_for_each(gpos, &vfio.group_list) {
> +		vgroup = list_entry(gpos, struct vfio_group, next);
> +		if (vgroup->container != vcontainer)
> +			continue;
> +
> +		list_for_each(dpos, &vgroup->device_list) {
> +			struct vfio_device *vdev;
> +			char buf2[MAX_PATH];
> +
> +			vdev = list_entry(dpos, struct vfio_device, next);
> +
> +			snprintf(buf2, MAX_PATH, "%s", dev_name(vdev->dev));
> +
> +			if (!strncmp(buf, buf2, MAX_PATH)) {
> +				if (!vdev->file) {
> +					vdev->file = anon_inode_getfile(
> +							"vfio-device",
> +							&vfio_device_fops,
> +							vdev, O_RDWR);
> +					if (IS_ERR(vdev->file)) {
> +						ret = PTR_ERR(vdev->file);
> +						vdev->file = NULL;
> +						goto out;
> +					}
> +				}
> +				ret = get_unused_fd();
> +				if (ret < 0)
> +					goto out;
> +
> +				fd_install(ret, vdev->file);
> +
> +				vdev->refcnt++;
> +				vcontainer->iommu->refcnt++;
> +				goto out;
> +			}
> +		}
> +	}
> +out:
> +	mutex_unlock(&vfio.group_lock);
> +	return ret;
> +}
> +
> +static long vfio_group_unl_ioctl(struct file *filep,
> +				 unsigned int cmd, unsigned long arg)
> +{
> +	struct vfio_group *vgroup = filep->private_data;
> +
> +	if (vgroup->mm != current->mm)
> +		return -EIO;
> +
> +	switch (cmd) {
> +	case VFIO_GROUP_MERGE:
> +	case VFIO_GROUP_UNMERGE:
> +		{
> +			int fd;
> +		
> +			if (get_user(fd, (int __user *)arg))
> +				return -EFAULT;
> +			if (fd < 0)
> +				return -EINVAL;
> +
> +			if (cmd == VFIO_GROUP_MERGE)
> +				return vfio_group_merge(vgroup, fd);
> +			else
> +				return vfio_group_unmerge(vgroup, fd);
> +		}
> +	case VFIO_GROUP_GET_IOMMU_FD:
> +		return vfio_group_get_iommu_fd(vgroup);
> +	case VFIO_GROUP_GET_DEVICE_FD:
> +		{
> +			char *buf;
> +			int ret;
> +
> +			buf = strndup_user((const char __user *)arg, MAX_PATH);
> +			if (IS_ERR(buf))
> +				return PTR_ERR(buf);
> +
> +			ret = vfio_group_get_device_fd(vgroup, buf);
> +			kfree(buf);
> +			return ret;
> +		}
> +	}
> +	return -ENOSYS;
> +}
> +
> +
> +#ifdef CONFIG_COMPAT
> +static long vfio_group_compat_ioctl(struct file *filep,
> +				    unsigned int cmd, unsigned long arg)
> +{
> +	arg = (unsigned long)compat_ptr(arg);
> +	return vfio_group_unl_ioctl(filep, cmd, arg);
> +}
> +#endif	/* CONFIG_COMPAT */
> +
> +static int vfio_group_open(struct inode *inode, struct file *filep)
> +{
> +	struct vfio_group *vgroup;
> +	int ret = 0;
> +
> +	mutex_lock(&vfio.group_lock);
> +
> +	vgroup = idr_find(&vfio.idr, iminor(inode));
> +
> +	if (!vgroup) {
> +		ret = -ENODEV;
> +		goto out;
> +	}
> +
> +	if (!vgroup->refcnt) {
> +		struct vfio_container *vcontainer;
> +		vcontainer = kzalloc(sizeof(*vcontainer), GFP_KERNEL);
> +		if (!vcontainer) {
> +			ret = -ENOMEM;
> +			goto out;
> +		}
> +		vgroup->container = vcontainer;
> +		vgroup->mm = current->mm;
> +	} else if (current->mm != vgroup->mm) {
> +		ret = -EBUSY;
> +		goto out;
> +	}
> +	filep->private_data = vgroup;
> +	vgroup->refcnt++;
> +	vgroup->container->refcnt++;
> +out:
> +	mutex_unlock(&vfio.group_lock);
> +
> +	return ret;
> +}
> +
> +static int vfio_group_release(struct inode *inode, struct file *filep)
> +{
> +	struct vfio_group *vgroup = filep->private_data;
> +	struct vfio_container *vcontainer = vgroup->container;
> +	struct list_head *pos;
> +	int ret = 0;
> +
> +	mutex_lock(&vfio.group_lock);
> +
> +	if (vgroup->refcnt > 1) {
> +		vgroup->refcnt--;
> +		vcontainer->refcnt--;
> +		goto out;
> +	}
> +
> +	list_for_each(pos, &vgroup->device_list) {
> +		struct vfio_device *vdev;
> +		vdev = list_entry(pos, struct vfio_device, next);
> +		if (vdev->refcnt) {
> +			ret = -EBUSY;
> +			goto out;
> +		}
> +	}
> +
> +	/* Merged group? */
> +	if (vcontainer->refcnt > 1) {
> +		if (vcontainer->iommu) {
> +			list_for_each(pos, &vgroup->device_list) {
> +				struct vfio_device *vdev;
> +				vdev = list_entry(pos,
> +						  struct vfio_device, next);
> +				iommu_detach_device(vcontainer->iommu->domain,
> +						    vdev->dev);
> +				vdev->iommu = NULL;
> +			}
> +		}
> +		vcontainer->refcnt--;
> +		vfio_container_reset_read(vcontainer);
> +	} else {
> +		if (vcontainer->iommu && vcontainer->iommu->refcnt) {
> +			ret = -EBUSY;
> +			goto out;
> +		}
> +
> +		ret = __vfio_close_iommu(vcontainer);
> +		if (ret)
> +			goto out;
> +
> +		kfree(vcontainer->read_buf);
> +		kfree(vcontainer);
> +	}
> +
> +	vgroup->refcnt--;
> +	vgroup->mm = NULL;
> +	vgroup->container = NULL;
> +
> +	/* Possible we had the group open while device members were removed */
> +	if (list_empty(&vgroup->device_list)) {
> +		device_destroy(vfio.class, vgroup->devt);
> +		idr_remove(&vfio.idr, MINOR(vgroup->devt));
> +		list_del(&vgroup->next);
> +		kfree(vgroup);
> +	}
> +out:
> +	mutex_unlock(&vfio.group_lock);
> +	return 0;
> +}
> +
> +static int __vfio_container_create_read_buf(struct vfio_container *vcontainer)
> +{
> +	struct list_head *gpos, *dpos;
> +	struct vfio_group *vgroup;
> +	struct vfio_device *vdev;
> +	int off = 0;
> +	char *buf;
> +
> +	buf = kzalloc(MAX_PATH, GFP_KERNEL);
> +	if (!buf)
> +		return -ENOMEM;
> +
> +	list_for_each(gpos, &vfio.group_list) {
> +		vgroup = list_entry(gpos, struct vfio_group, next);
> +		if (vgroup->container != vcontainer)
> +			continue;
> +
> +		off += snprintf(buf + off, MAX_PATH,
> +				"group: %u\n", vgroup->group);
> +		buf = krealloc(buf, off + MAX_PATH, GFP_KERNEL);
> +		if (!buf)
> +			return -ENOMEM;
> +		memset(buf + off, 0, MAX_PATH);
> +
> +		list_for_each(dpos, &vgroup->device_list) {
> +			vdev = list_entry(dpos, struct vfio_device, next);
> +
> +			off += snprintf(buf + off, MAX_PATH,
> +					"device: %s\n", dev_name(vdev->dev));
> +			buf = krealloc(buf, off + MAX_PATH, GFP_KERNEL);
> +			if (!buf)
> +				return -ENOMEM;
> +			memset(buf + off, 0, MAX_PATH);
> +		}
> +	}
> +	buf = krealloc(buf, off + 1, GFP_KERNEL);
> +	if (!buf)
> +		return -ENOMEM;
> +
> +	vcontainer->read_buf = buf;
> +	return 0;
> +}
> +
> +static ssize_t vfio_group_read(struct file *filep, char __user *buf,
> +			       size_t count, loff_t *ppos)
> +{
> +	struct vfio_group *vgroup = filep->private_data;
> +	struct vfio_container *vcontainer;
> +	ssize_t ret = 0;
> +
> +	mutex_lock(&vfio.group_lock);
> +
> +	vcontainer = vgroup->container;
> +
> +	if (!vcontainer) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	if (!vcontainer->read_buf) {
> +		ret = __vfio_container_create_read_buf(vcontainer);
> +		if (ret)
> +			goto out;
> +	}
> +
> +	if (*ppos >= strlen(vcontainer->read_buf) + 1) {
> +		ret = 0;
> +		goto out;
> +	}
> +
> +	if (*ppos + count > strlen(vcontainer->read_buf) + 1)
> +		count = strlen(vcontainer->read_buf) + 1 - *ppos;
> +
> +	if (copy_to_user(buf, vcontainer->read_buf + *ppos, count)) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	*ppos += count;
> +	ret = count;
> +out:
> +	mutex_unlock(&vfio.group_lock);
> +	return ret;
> +}
> +
> +static const struct file_operations vfio_group_fops = {
> +	.owner		= THIS_MODULE,
> +	.open		= vfio_group_open,
> +	.release	= vfio_group_release,
> +	.read		= vfio_group_read,
> +	.unlocked_ioctl	= vfio_group_unl_ioctl,
> +#ifdef CONFIG_COMPAT
> +	.compat_ioctl	= vfio_group_compat_ioctl,
> +#endif
> +};
> +
> +static void vfio_class_release(struct kref *kref)
> +{
> +	class_destroy(vfio.class);
> +	vfio.class = NULL;
> +}
> +
> +static char *vfio_devnode(struct device *dev, mode_t *mode)
> +{
> +	return kasprintf(GFP_KERNEL, "vfio/%s", dev_name(dev));
> +}
> +
> +static int __init vfio_init(void)
> +{
> +	int ret;
> +
> +	idr_init(&vfio.idr);
> +	mutex_init(&vfio.group_lock);
> +	INIT_LIST_HEAD(&vfio.group_list);
> +
> +	kref_init(&vfio.kref);
> +	vfio.class = class_create(THIS_MODULE, "vfio");
> +	if (IS_ERR(vfio.class)) {
> +		ret = PTR_ERR(vfio.class);
> +		goto err_class;
> +	}
> +
> +	vfio.class->devnode = vfio_devnode;
> +
> +	/* FIXME - how many minors to allocate... all of them! */
> +	ret = alloc_chrdev_region(&vfio.devt, 0, MINORMASK, "vfio");
> +	if (ret)
> +		goto err_chrdev;
> +
> +	cdev_init(&vfio.cdev, &vfio_group_fops);
> +	ret = cdev_add(&vfio.cdev, vfio.devt, MINORMASK);
> +	if (ret)
> +		goto err_cdev;
> +
> +	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
> +
> +	return 0;
> +
> +err_cdev:
> +	unregister_chrdev_region(vfio.devt, MINORMASK);
> +err_chrdev:
> +	kref_put(&vfio.kref, vfio_class_release);
> +err_class:
> +	return ret;
> +}
> +
> +static void __exit vfio_cleanup(void)
> +{
> +	struct list_head *gpos, *gppos;
> +
> +	list_for_each_safe(gpos, gppos, &vfio.group_list) {
> +		struct vfio_group *vgroup;
> +		struct list_head *dpos, *dppos;
> +
> +		vgroup = list_entry(gpos, struct vfio_group, next);
> +
> +		list_for_each_safe(dpos, dppos, &vgroup->device_list) {
> +			struct vfio_device *vdev;
> +
> +			vdev = list_entry(dpos, struct vfio_device, next);
> +			vfio_group_del_dev(vdev->dev);
> +		}
> +	}
> +
> +	idr_destroy(&vfio.idr);
> +	cdev_del(&vfio.cdev);
> +	unregister_chrdev_region(vfio.devt, MINORMASK);
> +	kref_put(&vfio.kref, vfio_class_release);
> +}
> +
> +module_init(vfio_init);
> +module_exit(vfio_cleanup);
> +
> +MODULE_VERSION(DRIVER_VERSION);
> +MODULE_LICENSE("GPL v2");
> +MODULE_AUTHOR(DRIVER_AUTHOR);
> +MODULE_DESCRIPTION(DRIVER_DESC);
> diff --git a/drivers/vfio/vfio_private.h b/drivers/vfio/vfio_private.h
> new file mode 100644
> index 0000000..2cc300c
> --- /dev/null
> +++ b/drivers/vfio/vfio_private.h
> @@ -0,0 +1,82 @@
> +/*
> + * Copyright (C) 2011 Red Hat, Inc.  All rights reserved.
> + *     Author: Alex Williamson <alex.williamson@redhat.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * Derived from original vfio:
> + * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
> + * Author: Tom Lyon, pugs@cisco.com
> + */
> +
> +#include <linux/cdev.h>
> +#include <linux/device.h>
> +#include <linux/file.h>
> +#include <linux/fs.h>
> +#include <linux/idr.h>
> +#include <linux/iommu.h>
> +#include <linux/list.h>
> +#include <linux/mm.h>
> +#include <linux/mutex.h>
> +
> +#ifndef VFIO_PRIVATE_H
> +#define VFIO_PRIVATE_H
> +
> +extern const struct file_operations vfio_iommu_fops;
> +extern const struct file_operations vfio_device_fops;
> +
> +struct vfio {
> +	dev_t			devt;
> +	struct cdev		cdev;
> +	struct list_head	group_list;
> +	struct mutex		group_lock;
> +	struct kref		kref;
> +	struct class		*class;
> +	struct idr		idr;
> +};
> +
> +struct vfio_device_ops {
> +	struct vfio_device	*(* new)(struct device *);
> +	void			(* free)(struct vfio_device *);
> +	struct file_operations	fops;
> +};
> +
> +struct vfio_iommu {
> +	struct iommu_domain	*domain;
> +	struct vfio		*vfio;
> +	int			refcnt;
> +	struct file		*file;
> +};
> +
> +struct vfio_device {
> +	struct device		*dev;
> +	struct list_head	next;
> +	struct file		*file;
> +	struct vfio_device_ops	*ops;
> +	struct vfio		*vfio;
> +	struct vfio_iommu	*iommu;
> +	int			refcnt;
> +};
> +
> +struct vfio_container {
> +	struct vfio_iommu	*iommu;
> +	char			*read_buf;
> +	int			refcnt;
> +};
> +
> +struct vfio_group {
> +	dev_t			devt;
> +	unsigned int		group;
> +	int			refcnt;
> +	struct mm_struct	*mm;
> +	struct vfio_container	*container;
> +	struct list_head	device_list;
> +	struct list_head	next;
> +};
> +
> +extern int vfio_group_add_dev(struct device *dev, void *data);
> +extern void vfio_group_del_dev(struct device *dev);
> +
> +#endif /* VFIO_PRIVATE_H */
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 4/5] VFIO: Add PCI device support
       [not found] ` <20110901195050.2391.12028.stgit@s20.home>
@ 2011-09-07 18:55   ` Konrad Rzeszutek Wilk
  2011-09-08  7:52     ` Avi Kivity
  0 siblings, 1 reply; 12+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-09-07 18:55 UTC (permalink / raw)
  To: Alex Williamson
  Cc: aafabbri, aik, kvm, pmac, qemu-devel, joerg.roedel, agraf, dwg,
	chrisw, B08248, iommu, avi, linux-pci, B07421, benve

On Thu, Sep 01, 2011 at 01:50:50PM -0600, Alex Williamson wrote:
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
> 
>  drivers/vfio/Kconfig        |    7 ++
>  drivers/vfio/Makefile       |    1 
>  drivers/vfio/vfio_main.c    |   10 +++
>  drivers/vfio/vfio_pci.c     |  124 +++++++++++++++++++++++++++++++++++++++++++
>  drivers/vfio/vfio_private.h |    5 ++
>  5 files changed, 147 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/vfio/vfio_pci.c
> 
> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> index a150521..b17bdbd 100644
> --- a/drivers/vfio/Kconfig
> +++ b/drivers/vfio/Kconfig
> @@ -3,3 +3,10 @@ menuconfig VFIO
>  	depends on IOMMU_API
>  	help
>  	  If you don't know what to do here, say N.
> +
> +menuconfig VFIO_PCI
> +	bool "VFIO support for PCI devices"
> +	depends on VFIO && PCI
> +	default y if X86

Hahah.. And Linus is going to tear your behind for that.

Default should be 'n'

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 4/5] VFIO: Add PCI device support
  2011-09-07 18:55   ` [RFC PATCH 4/5] VFIO: Add PCI device support Konrad Rzeszutek Wilk
@ 2011-09-08  7:52     ` Avi Kivity
  2011-09-08 21:52       ` Alex Williamson
  0 siblings, 1 reply; 12+ messages in thread
From: Avi Kivity @ 2011-09-08  7:52 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: aafabbri, aik, kvm, pmac, qemu-devel, joerg.roedel, agraf, iommu,
	dwg, chrisw, B08248, Alex Williamson, linux-pci, B07421, benve

On 09/07/2011 09:55 PM, Konrad Rzeszutek Wilk wrote:
> >   	If you don't know what to do here, say N.
> >  +
> >  +menuconfig VFIO_PCI
> >  +	bool "VFIO support for PCI devices"
> >  +	depends on VFIO&&  PCI
> >  +	default y if X86
>
> Hahah.. And Linus is going to tear your behind for that.
>
> Default should be 'n'

It depends on VFIO, which presumably defaults to n.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 4/5] VFIO: Add PCI device support
  2011-09-08  7:52     ` Avi Kivity
@ 2011-09-08 21:52       ` Alex Williamson
  0 siblings, 0 replies; 12+ messages in thread
From: Alex Williamson @ 2011-09-08 21:52 UTC (permalink / raw)
  To: Avi Kivity
  Cc: aafabbri, aik, kvm, pmac, qemu-devel, joerg.roedel,
	Konrad Rzeszutek Wilk, agraf, dwg, chrisw, B08248, iommu,
	linux-pci, B07421, benve

On Thu, 2011-09-08 at 10:52 +0300, Avi Kivity wrote:
> On 09/07/2011 09:55 PM, Konrad Rzeszutek Wilk wrote:
> > >   	If you don't know what to do here, say N.
> > >  +
> > >  +menuconfig VFIO_PCI
> > >  +	bool "VFIO support for PCI devices"
> > >  +	depends on VFIO&&  PCI
> > >  +	default y if X86
> >
> > Hahah.. And Linus is going to tear your behind for that.
> >
> > Default should be 'n'
> 
> It depends on VFIO, which presumably defaults to n.

Yes, exactly.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 0/5] VFIO-NG group/device/iommu framework
  2011-09-07 11:58 ` [RFC PATCH 0/5] VFIO-NG group/device/iommu framework Alexander Graf
@ 2011-09-08 21:54   ` Alex Williamson
  0 siblings, 0 replies; 12+ messages in thread
From: Alex Williamson @ 2011-09-08 21:54 UTC (permalink / raw)
  To: Alexander Graf
  Cc: aafabbri, aik, kvm, pmac, joerg.roedel, qemu-devel, dwg, chrisw,
	B08248, iommu, avi, linux-pci, B07421, benve

On Wed, 2011-09-07 at 13:58 +0200, Alexander Graf wrote:
> On 01.09.2011, at 21:50, Alex Williamson wrote:
> 
> > Trying to move beyond talking about how VFIO should work to
> > re-writing the code.  This is pre-alpha, known broken, will
> > probably crash your system but it illustrates some of how
> > I see groups, devices, and iommus interacting.  This is just
> > the framework, no code to actually support user space drivers
> > or device assignment yet.
> > 
> > The iommu portions are still using the "FIXME" PCI specific
> > hooks.  Once Joerg gets some buy-in on his bus specific iommu
> > patches, we can move to that.
> > 
> > The group management is more complicated than I'd like and
> > you can get groups into a bad state by killing the test program
> > with devices/iommus open.  The locking is overly simplistic.
> > But, it's a start.  Please make constructive comments and
> > suggestions.  Patches based on v3.0.  Thanks,
> 
> Looks pretty reasonable to me so far, but I guess we only know for sure once we have non-PCI implemented and working with this scheme as well.
> Btw I couldn't find the PCI BAR regions mmaps and general config space exposure. Where has that gone?

I ripped it out for now just to work on the group/device/iommu
framework.  I didn't see a need to make a functional RFC just to get
some buy-in on the framework.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 3/5] VFIO: Base framework for new VFIO driver
  2011-09-07 14:52   ` [RFC PATCH 3/5] VFIO: Base framework for new VFIO driver Konrad Rzeszutek Wilk
@ 2011-09-19 16:42     ` Alex Williamson
  0 siblings, 0 replies; 12+ messages in thread
From: Alex Williamson @ 2011-09-19 16:42 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: chrisw, aik, pmac, dwg, joerg.roedel, agraf, benve, aafabbri,
	B08248, B07421, avi, kvm, qemu-devel, iommu, linux-pci

Sorry for the delay, just getting back from LPC and some time off...

On Wed, 2011-09-07 at 10:52 -0400, Konrad Rzeszutek Wilk wrote:
> > +static long vfio_iommu_unl_ioctl(struct file *filep,
> > +				 unsigned int cmd, unsigned long arg)
> > +{
> > +	struct vfio_iommu *viommu = filep->private_data;
> > +	struct vfio_dma_map dm;
> > +	int ret = -ENOSYS;
> > +
> > +	switch (cmd) {
> > +	case VFIO_IOMMU_MAP_DMA:
> > +		if (copy_from_user(&dm, (void __user *)arg, sizeof dm))
> > +			return -EFAULT;
> > +		ret = 0; // XXX - Do something
> 
> <chuckles>

Truly an RFC ;)

> > +		if (!ret && copy_to_user((void __user *)arg, &dm, sizeof dm))
> > +			ret = -EFAULT;
> > +		break;
> > +
> > +	case VFIO_IOMMU_UNMAP_DMA:
> > +		if (copy_from_user(&dm, (void __user *)arg, sizeof dm))
> > +			return -EFAULT;
> > +		ret = 0; // XXX - Do something
> > +		if (!ret && copy_to_user((void __user *)arg, &dm, sizeof dm))
> > +			ret = -EFAULT;
> > +		break;
> > +	}
> > +	return ret;
> > +}
> > +
> > +#ifdef CONFIG_COMPAT
> > +static long vfio_iommu_compat_ioctl(struct file *filep,
> > +				    unsigned int cmd, unsigned long arg)
> > +{
> > +	arg = (unsigned long)compat_ptr(arg);
> > +	return vfio_iommu_unl_ioctl(filep, cmd, arg);
> > +}
> > +#endif	/* CONFIG_COMPAT */
> > +
> > +const struct file_operations vfio_iommu_fops = {
> > +	.owner		= THIS_MODULE,
> > +	.release	= vfio_iommu_release,
> > +	.unlocked_ioctl	= vfio_iommu_unl_ioctl,
> > +#ifdef CONFIG_COMPAT
> > +	.compat_ioctl	= vfio_iommu_compat_ioctl,
> > +#endif
> > +};
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> .. snip..
> > +int vfio_group_add_dev(struct device *dev, void *data)
> > +{
> > +	struct vfio_device_ops *ops = data;
> > +	struct list_head *pos;
> > +	struct vfio_group *vgroup = NULL;
> > +	struct vfio_device *vdev = NULL;
> > +	unsigned int group;
> > +	int ret = 0, new_group = 0;
> 
> 'new_group' should probably be 'bool'.

ok

> > +
> > +	if (iommu_device_group(dev, &group))
> > +		return 0;
> 
> -EEXIST?

I think I made this return 0 because it's called from device add
notifiers and walking devices lists.  It's ok for it to fail, not all
devices have to be backed by an iommu, they just won't show up in vfio.
Maybe I should leave that to the leaf callers though.  EINVAL is
probably more appropriate.

> > +
> > +	mutex_lock(&vfio.group_lock);
> > +
> > +	list_for_each(pos, &vfio.group_list) {
> > +		vgroup = list_entry(pos, struct vfio_group, next);
> > +		if (vgroup->group == group)
> > +			break;
> > +		vgroup = NULL;
> > +	}
> > +
> > +	if (!vgroup) {
> > +		int id;
> > +
> > +		if (unlikely(idr_pre_get(&vfio.idr, GFP_KERNEL) == 0)) {
> > +			ret = -ENOMEM;
> > +			goto out;
> > +		}
> > +		vgroup = kzalloc(sizeof(*vgroup), GFP_KERNEL);
> > +		if (!vgroup) {
> > +			ret = -ENOMEM;
> > +			goto out;
> > +		}
> > +
> > +		vgroup->group = group;
> > +		INIT_LIST_HEAD(&vgroup->device_list);
> > +
> > +		ret = idr_get_new(&vfio.idr, vgroup, &id);
> > +		if (ret == 0 && id > MINORMASK) {
> > +			idr_remove(&vfio.idr, id);
> > +			kfree(vgroup);
> > +			ret = -ENOSPC;
> > +			goto out;
> > +		}
> > +
> > +		vgroup->devt = MKDEV(MAJOR(vfio.devt), id);
> > +		list_add(&vgroup->next, &vfio.group_list);
> > +		device_create(vfio.class, NULL, vgroup->devt,
> > +			      vgroup, "%u", group);
> > +
> > +		new_group = 1;
> > +	} else {
> > +		list_for_each(pos, &vgroup->device_list) {
> > +			vdev = list_entry(pos, struct vfio_device, next);
> > +			if (vdev->dev == dev)
> > +				break;
> > +			vdev = NULL;
> > +		}
> > +	}
> > +
> > +	if (!vdev) {
> > +		/* Adding a device for a group that's already in use? */
> > +		/* Maybe we should attach to the domain so others can't */
> > +		BUG_ON(vgroup->container &&
> > +		       vgroup->container->iommu &&
> > +		       vgroup->container->iommu->refcnt);
> > +
> > +		vdev = ops->new(dev);
> > +		if (IS_ERR(vdev)) {
> > +			/* If we just created this vgroup, tear it down */
> > +			if (new_group) {
> > +				device_destroy(vfio.class, vgroup->devt);
> > +				idr_remove(&vfio.idr, MINOR(vgroup->devt));
> > +				list_del(&vgroup->next);
> > +				kfree(vgroup);
> > +			}
> > +			ret = PTR_ERR(vdev);
> > +			goto out;
> > +		}
> > +		list_add(&vdev->next, &vgroup->device_list);
> > +		vdev->dev = dev;
> > +		vdev->ops = ops;
> > +		vdev->vfio = &vfio;
> > +	}
> > +out:
> > +	mutex_unlock(&vfio.group_lock);
> > +	return ret;
> > +}
> > +
> > +void vfio_group_del_dev(struct device *dev)
> > +{
> > +	struct list_head *pos;
> > +	struct vfio_container *vcontainer;
> > +	struct vfio_group *vgroup = NULL;
> > +	struct vfio_device *vdev = NULL;
> > +	unsigned int group;
> > +
> > +	if (iommu_device_group(dev, &group))
> > +		return;
> > +
> > +	mutex_lock(&vfio.group_lock);
> > +
> > +	list_for_each(pos, &vfio.group_list) {
> > +		vgroup = list_entry(pos, struct vfio_group, next);
> > +		if (vgroup->group == group)
> > +			break;
> > +		vgroup = NULL;
> > +	}
> > +
> > +	if (!vgroup)
> > +		goto out;
> > +
> > +	vcontainer = vgroup->container;
> > +
> > +	list_for_each(pos, &vgroup->device_list) {
> > +		vdev = list_entry(pos, struct vfio_device, next);
> > +		if (vdev->dev == dev)
> > +			break;
> > +		vdev = NULL;
> > +	}
> > +
> > +	if (!vdev)
> > +		goto out;
> > +
> > +	/* XXX Did a device we're using go away? */
> > +	BUG_ON(vdev->refcnt);
> > +
> > +	if (vcontainer && vcontainer->iommu) {
> > +		iommu_detach_device(vcontainer->iommu->domain, vdev->dev);
> > +		vfio_container_reset_read(vcontainer);
> > +	}
> > +
> > +	list_del(&vdev->next);
> > +	vdev->ops->free(vdev);
> > +
> > +	if (list_empty(&vgroup->device_list) && vgroup->refcnt == 0) {
> > +		device_destroy(vfio.class, vgroup->devt);
> > +		idr_remove(&vfio.idr, MINOR(vgroup->devt));
> > +		list_del(&vgroup->next);
> > +		kfree(vgroup);
> > +	}
> > +out:
> > +	mutex_unlock(&vfio.group_lock);
> > +}
> > +
> > +static int __vfio_group_viable(struct vfio_container *vcontainer)
> 
> Just return 'bool'

Sure

> > +{
> > +	struct list_head *gpos, *dpos;
> > +
> > +	list_for_each(gpos, &vfio.group_list) {
> > +		struct vfio_group *vgroup;
> > +		vgroup = list_entry(gpos, struct vfio_group, next);
> > +		if (vgroup->container != vcontainer)
> > +			continue;
> > +
> > +		list_for_each(dpos, &vgroup->device_list) {
> > +			struct vfio_device *vdev;
> > +			vdev = list_entry(dpos, struct vfio_device, next);
> > +
> > +			if (!vdev->dev->driver ||
> > +			    vdev->dev->driver->owner != THIS_MODULE)
> > +				return 0;
> > +		}
> > +	}
> > +	return 1;
> > +}
> > +
> > +static int __vfio_close_iommu(struct vfio_container *vcontainer)
> > +{
> > +	struct list_head *gpos, *dpos;
> > +	struct vfio_iommu *viommu = vcontainer->iommu;
> > +	struct vfio_group *vgroup;
> > +	struct vfio_device *vdev;
> > +
> > +	if (!viommu)
> > +		return 0;
> > +
> > +	if (viommu->refcnt)
> > +		return -EBUSY;
> > +
> > +	list_for_each(gpos, &vfio.group_list) {
> > +		vgroup = list_entry(gpos, struct vfio_group, next);
> > +		if (vgroup->container != vcontainer)
> > +			continue;
> > +
> > +		list_for_each(dpos, &vgroup->device_list) {
> > +			vdev = list_entry(dpos, struct vfio_device, next);
> > +			iommu_detach_device(viommu->domain, vdev->dev);
> > +			vdev->iommu = NULL;
> > +		}
> > +	}
> > +	iommu_domain_free(viommu->domain);
> > +	kfree(viommu);
> > +	vcontainer->iommu = NULL;
> > +	return 0;
> > +}
> > +
> > +static int __vfio_open_iommu(struct vfio_container *vcontainer)
> > +{
> > +	struct list_head *gpos, *dpos;
> > +	struct vfio_iommu *viommu;
> > +	struct vfio_group *vgroup;
> > +	struct vfio_device *vdev;
> > +
> > +	if (!__vfio_group_viable(vcontainer))
> > +		return -EBUSY;
> > +
> > +	viommu = kzalloc(sizeof(*viommu), GFP_KERNEL);
> > +	if (!viommu)
> > +		return -ENOMEM;
> > +
> > +	viommu->domain = iommu_domain_alloc();
> > +	if (!viommu->domain) {
> > +		kfree(viommu);
> > +		return -EFAULT;
> > +	}
> > +
> > +	viommu->vfio = &vfio;
> > +	vcontainer->iommu = viommu;
> > +
> 
> No need for
>   mutex_lock(&vfio.group_lock);
> 
> Ah, you already hold the lock when using this function.

Right, just really simple, broad locking right now.

> > +	list_for_each(gpos, &vfio.group_list) {
> > +		vgroup = list_entry(gpos, struct vfio_group, next);
> > +		if (vgroup->container != vcontainer)
> > +			continue;
> > +
> > +		list_for_each(dpos, &vgroup->device_list) {
> > +			int ret;
> > +
> > +			vdev = list_entry(dpos, struct vfio_device, next);
> > +
> > +			ret = iommu_attach_device(viommu->domain, vdev->dev);
> > +			if (ret) {
> > +				__vfio_close_iommu(vcontainer);
> > +				return ret;
> > +			}
> > +			vdev->iommu = viommu;
> > +		}
> > +	}
> > +
> > +	if (!allow_unsafe_intrs &&
> > +	    !iommu_domain_has_cap(viommu->domain, IOMMU_CAP_INTR_REMAP)) {
> > +		__vfio_close_iommu(vcontainer);
> > +		return -EFAULT;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int vfio_group_merge(struct vfio_group *vgroup, int fd)
> > +{
> > +	struct vfio_group *vgroup2;
> > +	struct iommu_domain *domain;
> > +	struct list_head *pos;
> > +	struct file *file;
> > +	int ret = 0;
> > +
> > +	mutex_lock(&vfio.group_lock);
> > +
> > +	file = fget(fd);
> > +	if (!file) {
> > +		ret = -EBADF;
> > +		goto out_noput;
> > +	}
> > +	if (file->f_op != &vfio_group_fops) {
> > +		ret = -EINVAL;
> > +		goto out;
> > +	}
> > +
> > +	vgroup2 = file->private_data;
> > +	if (!vgroup2 || vgroup2 == vgroup || vgroup2->mm != vgroup->mm ||
> > +	    (vgroup2->container->iommu && vgroup2->container->iommu->refcnt)) {
> > +		ret = -EINVAL;
> > +		goto out;
> > +	}
> > +
> > +	if (!vgroup->container->iommu) {
> > +		ret = __vfio_open_iommu(vgroup->container);
> > +		if (ret)
> > +			goto out;
> > +	}
> > +
> > +	if (!vgroup2->container->iommu) {
> > +		ret = __vfio_open_iommu(vgroup2->container);
> > +		if (ret)
> > +			goto out;
> > +	}
> > +
> > +	if (iommu_domain_has_cap(vgroup->container->iommu->domain,
> > +				 IOMMU_CAP_CACHE_COHERENCY) !=
> > +	    iommu_domain_has_cap(vgroup2->container->iommu->domain,
> > +				 IOMMU_CAP_CACHE_COHERENCY)) {
> > +		ret = -EINVAL;
> > +		goto out;
> > +	}
> > +
> > +	ret = __vfio_close_iommu(vgroup2->container);
> > +	if (ret)
> > +		goto out;
> > +
> > +	domain = vgroup->container->iommu->domain;
> > +
> > +	list_for_each(pos, &vgroup2->device_list) {
> > +		struct vfio_device *vdev;
> > +
> > +		vdev = list_entry(pos, struct vfio_device, next);
> > +
> > +		ret = iommu_attach_device(domain, vdev->dev);
> > +		if (ret) {
> > +			list_for_each(pos, &vgroup2->device_list) {
> > +				struct vfio_device *vdev2;
> > +
> > +				vdev2 = list_entry(pos,
> > +						   struct vfio_device, next);
> > +				if (vdev2 == vdev)
> > +					break;
> > +
> > +				iommu_detach_device(domain, vdev2->dev);
> > +				vdev2->iommu = NULL;
> > +			}
> > +			goto out;
> > +		}
> > +		vdev->iommu = vgroup->container->iommu;
> > +	}
> > +
> > +	kfree(vgroup2->container->read_buf);
> > +	kfree(vgroup2->container);
> > +
> > +	vgroup2->container = vgroup->container;
> > +	vgroup->container->refcnt++;
> > +	vfio_container_reset_read(vgroup->container);
> > +
> > +out:
> > +	fput(file);
> > +out_noput:
> > +	mutex_unlock(&vfio.group_lock);
> > +	return ret;
> > +}
> > +
> > +static int vfio_group_unmerge(struct vfio_group *vgroup, int fd)
> > +{
> > +	struct vfio_group *vgroup2;
> > +	struct vfio_container *vcontainer2;
> > +	struct vfio_device *vdev;
> > +	struct list_head *pos;
> > +	struct file *file;
> > +	int ret = 0;
> > +
> > +	vcontainer2 = kzalloc(sizeof(*vcontainer2), GFP_KERNEL);
> > +	if (!vcontainer2)
> > +		return -ENOMEM;
> > +
> > +	mutex_lock(&vfio.group_lock);
> > +
> > +	file = fget(fd);
> > +	if (!file) {
> > +		ret = -EBADF;
> > +		goto out_noput;
> > +	}
> > +	if (file->f_op != &vfio_group_fops) {
> 
> Hm, I think scripts/checkpath.pl will not like that, but as
> you said - it is RFC.

Will check

Thanks for the review!

Alex

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-09-19 16:42 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20110901194915.2391.97400.stgit@s20.home>
2011-09-01 19:50 ` [RFC PATCH 1/5] iommu: Add iommu_device_group callback and iommu_group sysfs entry Alex Williamson
2011-09-01 19:50 ` [RFC PATCH 2/5] intel-iommu: Implement iommu_device_group Alex Williamson
2011-09-01 19:50 ` [RFC PATCH 3/5] VFIO: Base framework for new VFIO driver Alex Williamson
2011-09-01 19:50 ` [RFC PATCH 4/5] VFIO: Add PCI device support Alex Williamson
2011-09-01 19:50 ` [RFC PATCH 5/5] VFIO: Simple test tool Alex Williamson
2011-09-07 11:58 ` [RFC PATCH 0/5] VFIO-NG group/device/iommu framework Alexander Graf
2011-09-08 21:54   ` Alex Williamson
     [not found] ` <20110901195043.2391.31843.stgit@s20.home>
2011-09-07 14:52   ` [RFC PATCH 3/5] VFIO: Base framework for new VFIO driver Konrad Rzeszutek Wilk
2011-09-19 16:42     ` Alex Williamson
     [not found] ` <20110901195050.2391.12028.stgit@s20.home>
2011-09-07 18:55   ` [RFC PATCH 4/5] VFIO: Add PCI device support Konrad Rzeszutek Wilk
2011-09-08  7:52     ` Avi Kivity
2011-09-08 21:52       ` Alex Williamson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).