linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] Device isolation infrastructure v2
@ 2011-12-15  6:25 David Gibson
  2011-12-15  6:25 ` [PATCH 1/3] device_isolation: Infrastructure for managing device isolation groups David Gibson
                   ` (3 more replies)
  0 siblings, 4 replies; 23+ messages in thread
From: David Gibson @ 2011-12-15  6:25 UTC (permalink / raw)
  To: alex.williamson
  Cc: aik, benh, joerg.roedel, dwmw2, chrisw, agraf, scottwood, B08248,
	rusty, iommu, qemu-devel, linux-kernel

Here's the second spin of my preferred approach to handling grouping
of devices for safe assignment to guests.

Changes since v1:
 * Many name changes and file moves for improved consistency
 * Bugfixes and cleanups
 * The interface to the next layer up is considerably fleshed out,
   although it still needs work.
 * Example initialization of groups for p5ioc2 and p7ioc.

TODO:
 * Need sample initialization of groups for intel and/or amd iommus
 * Use of sysfs attributes to control group permission is probably a
   mistake.  Although it seems a bit odd, registering a chardev for
   each group is probably better, because perms can be set from udev
   rules, just like everything else.
 * Need more details of what the binder structure will need to
   contain.
 * Handle complete removal of groups.
 * Clarify what will need to happen on the hot unplug path.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/3] device_isolation: Infrastructure for managing device isolation groups
  2011-12-15  6:25 [RFC] Device isolation infrastructure v2 David Gibson
@ 2011-12-15  6:25 ` David Gibson
  2011-12-15  6:25 ` [PATCH 2/3] device_isolation: Support isolation on POWER p5ioc2 bridges David Gibson
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 23+ messages in thread
From: David Gibson @ 2011-12-15  6:25 UTC (permalink / raw)
  To: alex.williamson
  Cc: aik, benh, joerg.roedel, dwmw2, chrisw, agraf, scottwood, B08248,
	rusty, iommu, qemu-devel, linux-kernel

From: Alexey Kardashevskiy <aik@ozlabs.ru>

In order to safely drive a device with a userspace driver, or to pass
it through to a guest system, we must first make sure that the device
is isolated in such a way that it cannot interfere with other devices
on the system.  This isolation is only available on some systems and
will generally require an iommu, and might require other support in
bridges or other system hardware.

Often, it's not possible to isolate every device from every device
from every other device in the system.  For example, certain PCI/PCIe
bridge configurations mean that an iommu cannot reliably distinguish
which device behind the bridge initiated a DMA transaction.  Similarly
some buggy multifunction devices initiate all DMAs as function 0, so
the functions cannot be isolated from each other, even if the IOMMU
normally allows this.

Therefore, the user, and code to allow userspace drivers or guest
passthrough, needs a way to determine which devices can be isolated
from which others.  This patch adds infrastructure to handle this by
introducing the concept of a "device isolation group" - a group of
devices which can, as a unit, be safely isolated from the rest of the
system and therefore can be, as a unit, safely assigned to an
unprivileged used or guest.  That is, the groups represent the minimum
granularity with which devices may be assigned to untrusted
components.

This code manages groups, but does not create them or of itself allow
use of grouped devices by a guest.  Creating groups would be done by
iommu or bridge drivers, using the interface this patch provides.
Using the groups is expected to be provided by something similar to
vfio, yet to be written, again using interfaces provided by this code.

What this code does do, is:

 * Allow iommu/bridge drivers to easily create groups and put devices
   in them.

 * Publish the groups in /sys/isolation with links from the group to
   its constituent devices and back.

 * Allow driver matching be disallowed on a group via a sysfs
   attribute.  When this is done, existing kernel drivers will be
   unbound from the group, and drivers will no longer match and bind
   to devices in the group.  Removing kernel drivers is clearly a
   prerequisite before the group can be assigned to a guest.

 * Allows a kernel subsystem to "bind" a group, effectively claiming
   exclusive use of it.  This can be done unprivileged, provided the
   process initiating the bind has permission to the group as
   determined by uid/gid/mode attributes.

It's expected that a subsystem like VFIO would use the "bind"
interface to claim the group for itself before passing the group on
to userspace or a guest.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 drivers/base/Kconfig             |    3 +
 drivers/base/Makefile            |    1 +
 drivers/base/base.h              |    5 +
 drivers/base/core.c              |    6 +
 drivers/base/device_isolation.c  |  509 ++++++++++++++++++++++++++++++++++++++
 drivers/base/init.c              |    2 +
 include/linux/device.h           |    5 +
 include/linux/device_isolation.h |  124 +++++++++
 8 files changed, 655 insertions(+), 0 deletions(-)
 create mode 100644 drivers/base/device_isolation.c
 create mode 100644 include/linux/device_isolation.h

diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 21cf46f..ed464c9 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -174,4 +174,7 @@ config SYS_HYPERVISOR
 
 source "drivers/base/regmap/Kconfig"
 
+config DEVICE_ISOLATION
+	bool "Enable isolating devices for safe pass-through to guests or user space."
+
 endmenu
diff --git a/drivers/base/Makefile b/drivers/base/Makefile
index 99a375a..79ac7b3 100644
--- a/drivers/base/Makefile
+++ b/drivers/base/Makefile
@@ -18,6 +18,7 @@ obj-$(CONFIG_MODULES)	+= module.o
 endif
 obj-$(CONFIG_SYS_HYPERVISOR) += hypervisor.o
 obj-$(CONFIG_REGMAP)	+= regmap/
+obj-$(CONFIG_DEVICE_ISOLATION) += device_isolation.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
 
diff --git a/drivers/base/base.h b/drivers/base/base.h
index a34dca0..8ee0c95 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -24,6 +24,9 @@
  * bus_type/class to be statically allocated safely.  Nothing outside of the
  * driver core should ever touch these fields.
  */
+
+#include <linux/device_isolation.h>
+
 struct subsys_private {
 	struct kset subsys;
 	struct kset *devices_kset;
@@ -108,6 +111,8 @@ extern int driver_probe_device(struct device_driver *drv, struct device *dev);
 static inline int driver_match_device(struct device_driver *drv,
 				      struct device *dev)
 {
+	if (!device_isolation_driver_match_allowed(dev))
+		return 0;
 	return drv->bus->match ? drv->bus->match(dev, drv) : 1;
 }
 
diff --git a/drivers/base/core.c b/drivers/base/core.c
index bc8729d..ca8a0fa 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -22,6 +22,7 @@
 #include <linux/kallsyms.h>
 #include <linux/mutex.h>
 #include <linux/async.h>
+#include <linux/device_isolation.h>
 
 #include "base.h"
 #include "power/power.h"
@@ -593,6 +594,9 @@ void device_initialize(struct device *dev)
 	lockdep_set_novalidate_class(&dev->mutex);
 	spin_lock_init(&dev->devres_lock);
 	INIT_LIST_HEAD(&dev->devres_head);
+#ifdef CONFIG_DEVICE_ISOLATION
+	dev->di_group = NULL;
+#endif
 	device_pm_init(dev);
 	set_dev_node(dev, -1);
 }
@@ -993,6 +997,8 @@ int device_add(struct device *dev)
 				class_intf->add_dev(dev, class_intf);
 		mutex_unlock(&dev->class->p->class_mutex);
 	}
+
+	device_isolation_dev_update_sysfs(dev);
 done:
 	put_device(dev);
 	return error;
diff --git a/drivers/base/device_isolation.c b/drivers/base/device_isolation.c
new file mode 100644
index 0000000..b13bb46
--- /dev/null
+++ b/drivers/base/device_isolation.c
@@ -0,0 +1,509 @@
+/*
+ * device_isolation.c
+ *
+ * Handling of device isolation groups, groups of hardware devices
+ * which are sufficiently isolated by an IOMMU from the rest of the
+ * system that they can be safely given (as a unit) to an unprivileged
+ * user process or guest system to drive.
+ *
+ * Copyright (c) 2011 Alexey Kardashevskiy, IBM Corporation
+ * Copyright (c) 2011 David Gibson, IBM Corporation
+ *
+ */
+
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/device_isolation.h>
+
+static struct kset *device_isolation_kset;
+
+struct dig_attribute {
+	struct attribute attr;
+	ssize_t (*show)(struct device_isolation_group *group, char *buf);
+	ssize_t (*store)(struct device_isolation_group *group, const char *buf,
+			size_t count);
+};
+
+#define DIG_ATTR(_name, _mode, _show, _store)	\
+	struct dig_attribute dig_attr_##_name = \
+		__ATTR(_name, _mode, _show, _store)
+
+#define to_dig_attr(_attr) \
+	container_of(_attr, struct dig_attribute, attr)
+
+static ssize_t dig_attr_show(struct kobject *kobj, struct attribute *attr,
+			     char *buf)
+{
+	struct dig_attribute *dig_attr = to_dig_attr(attr);
+	struct device_isolation_group *group =
+		container_of(kobj, struct device_isolation_group, kobj);
+	ssize_t ret = -EIO;
+
+	if (dig_attr->show)
+		ret = dig_attr->show(group, buf);
+	return ret;
+}
+
+static ssize_t dig_attr_store(struct kobject *kobj, struct attribute *attr,
+			      const char *buf, size_t count)
+{
+	struct dig_attribute *dig_attr = to_dig_attr(attr);
+	struct device_isolation_group *group =
+		container_of(kobj, struct device_isolation_group, kobj);
+	ssize_t ret = -EIO;
+
+	if (dig_attr->store)
+		ret = dig_attr->store(group, buf, count);
+	return ret;
+}
+
+static void dig_release(struct kobject *kobj)
+{
+	/* FIXME: No way for groups to be removed as yet */
+	BUG();
+}
+
+static const struct sysfs_ops dig_sysfs_ops = {
+	.show	= dig_attr_show,
+	.store	= dig_attr_store,
+};
+
+static struct kobj_type dig_ktype = {
+	.sysfs_ops	= &dig_sysfs_ops,
+	.release	= dig_release,
+};
+
+static ssize_t dig_show_binder(struct device_isolation_group *group, char *buf)
+{
+	ssize_t ret = 0;
+
+	mutex_lock(&group->mutex);
+	if (group->binder)
+		ret = sprintf(buf, "%s", group->binder->name);
+	mutex_unlock(&group->mutex);
+
+	return ret;
+}
+
+#if 0
+/* ONLY FOR DEBUG PURPOSE */
+static ssize_t dig_set_binder(struct device_isolation_group *group,
+			      const char *buf, size_t count)
+{
+	struct device_isolation_binder *binder;
+
+	if ((0 == buf[0]) || ('0' == buf[0])) {
+		binder = group->binder;
+		if (group->allow_driver_match) {
+			printk(KERN_ERR "device_isolation: not exclusive!!!\n");
+		} else {
+			device_isolation_unbind(group, binder);
+			if (binder) {
+				kfree(binder->name);
+				kfree(binder);
+			}
+		}
+	} else {
+		binder = kzalloc(sizeof(*binder), GFP_KERNEL);
+		binder->name = kstrdup(buf, GFP_KERNEL); 
+		if (0 > device_isolation_bind(group, binder, NULL)) {
+			kfree(binder->name);
+			kfree(binder);
+		}
+	}
+	return count;
+}
+
+static DIG_ATTR(binder, S_IWUSR | S_IRUSR | S_IROTH | S_IWOTH,
+		dig_show_binder, dig_set_binder);
+#endif
+
+static DIG_ATTR(binder, S_IWUSR | S_IRUSR, dig_show_binder, NULL);
+
+static ssize_t dig_show_allow_driver_match(struct device_isolation_group *group,
+					   char *buf)
+{
+	ssize_t ret;
+
+	mutex_lock(&group->mutex);
+	ret = sprintf(buf, "%u", !!group->allow_driver_match);
+	mutex_unlock(&group->mutex);
+
+	return ret;
+}
+
+static ssize_t dig_set_allow_driver_match(struct device_isolation_group *group,
+					  const char *buf, size_t count)
+{
+	switch (buf[0]) {
+	case '0':
+		device_isolation_disallow_driver_match(group);
+		break;
+	case '1':
+		device_isolation_allow_driver_match(group);
+		break;
+	default:
+		return -EINVAL;
+	}
+	return count;
+}
+
+static DIG_ATTR(allow_driver_match, S_IWUSR | S_IRUSR,
+		dig_show_allow_driver_match, dig_set_allow_driver_match);
+
+static ssize_t dig_show_uid(struct device_isolation_group *group,
+			      char *buf)
+{
+	ssize_t ret;
+
+	mutex_lock(&group->mutex);
+	ret = sprintf(buf, "%u", group->uid);
+	mutex_unlock(&group->mutex);
+
+	return ret;
+}
+
+static ssize_t dig_set_uid(struct device_isolation_group *group,
+		     const char *buf, size_t count)
+{
+	long val;
+
+	if (strict_strtol(buf, 10, &val))
+		return -EINVAL;
+
+	mutex_lock(&group->mutex);
+	group->uid = val;
+	mutex_unlock(&group->mutex);
+
+	return count;
+}
+
+static DIG_ATTR(uid, S_IWUSR | S_IRUSR, dig_show_uid, dig_set_uid);
+
+static ssize_t dig_show_gid(struct device_isolation_group *group,
+		     char *buf)
+{
+	ssize_t ret;
+
+	mutex_lock(&group->mutex);
+	ret = sprintf(buf, "%u", group->gid);
+	mutex_unlock(&group->mutex);
+
+	return ret;
+}
+
+static ssize_t dig_set_gid(struct device_isolation_group *group,
+		     const char *buf, size_t count)
+{
+	long val;
+
+	if (current_uid())
+		return -EACCES;
+
+	if (strict_strtol(buf, 10, &val))
+		return -EINVAL;
+
+	mutex_lock(&group->mutex);
+	group->gid = val;
+	mutex_unlock(&group->mutex);
+
+	return count;
+}
+
+static DIG_ATTR(gid, S_IWUSR | S_IRUSR, dig_show_gid, dig_set_gid);
+
+static ssize_t dig_show_mode(struct device_isolation_group *group,
+		     char *buf)
+{
+	ssize_t ret;
+
+	mutex_lock(&group->mutex);
+	ret = sprintf(buf, "0%03o", group->mode);
+	mutex_unlock(&group->mutex);
+
+	return ret;
+}
+
+static ssize_t dig_set_mode(struct device_isolation_group *group,
+		     const char *buf, size_t count)
+{
+	long val;
+
+	if (current_uid())
+		return -EACCES;
+
+	if (strict_strtol(buf, 8, &val))
+		return -EINVAL;
+
+	if (val & ~(S_IWUGO | S_IRUGO))
+		return -EINVAL;
+
+	mutex_lock(&group->mutex);
+	group->mode = val;
+	mutex_unlock(&group->mutex);
+
+	return count;
+}
+
+static DIG_ATTR(mode, S_IWUSR | S_IRUSR, dig_show_mode, dig_set_mode);
+
+int device_isolation_group_init(struct device_isolation_group *group,
+				const char *fmt, ...)
+{
+	int ret;
+	va_list args;
+
+	kobject_init(&group->kobj, &dig_ktype);
+	mutex_init(&group->mutex);
+	INIT_LIST_HEAD(&group->devices);
+	group->mode = 0600;
+	group->allow_driver_match = true;
+
+	group->kobj.kset = device_isolation_kset;
+
+	va_start(args, fmt);
+	ret = kobject_set_name_vargs(&group->kobj, fmt, args);
+	va_end(args);
+	if (ret < 0) {
+		printk(KERN_ERR "device_isolation: "
+		       "kobject_set_name_vargs() failed\n");
+		return ret;
+	}
+
+	ret = kobject_add(&group->kobj, NULL, NULL);
+	if (ret < 0) {
+		printk(KERN_ERR "device_isolation: "
+		       "kobject_add() failed for %s\n",
+		       kobject_name(&group->kobj));
+		return ret;
+	}
+
+
+#define CREATE_ATTR(_attr) \
+	do { \
+		if (sysfs_create_file(&group->kobj, \
+				      &dig_attr_##_attr.attr) < 0) \
+		printk(KERN_WARNING "device_isolation: create \"" \
+			#_attr "\" \failed for %s (errno=%d)\n", \
+		       kobject_name(&group->kobj), ret); \
+	} while (0)
+
+	CREATE_ATTR(allow_driver_match);
+	CREATE_ATTR(binder);
+	CREATE_ATTR(uid);
+	CREATE_ATTR(gid);
+	CREATE_ATTR(mode);
+
+#undef CREATE_ATTR
+
+	printk(KERN_DEBUG "device_isolation: group %s created\n",
+		kobject_name(&group->kobj));
+
+	return 0;
+}
+
+void device_isolation_dev_add(struct device_isolation_group *group,
+			      struct device *dev)
+{
+	printk(KERN_DEBUG "device_isolation: adding device %s to group %s\n",
+		kobject_name(&dev->kobj), kobject_name(&group->kobj));
+
+	mutex_lock(&group->mutex);
+	list_add_tail(&dev->di_list, &group->devices);
+	dev->di_group = group;
+	mutex_unlock(&group->mutex);
+}
+
+void device_isolation_dev_remove(struct device *dev)
+{
+	struct device_isolation_group *group = dev->di_group;
+
+	BUG_ON(!group);
+
+	mutex_lock(&group->mutex);
+	list_del(&dev->di_list);
+	mutex_unlock(&group->mutex);
+}
+
+int device_isolation_dev_update_sysfs(struct device *dev)
+{
+	int ret;
+	struct device_isolation_group *group = dev->di_group;
+
+	if (!group)
+		return 0;
+
+	printk(KERN_DEBUG "device_isolation: updating links for %s in "
+			"group %s\n", kobject_name(&dev->kobj),
+			kobject_name(&group->kobj));
+
+	mutex_lock(&group->mutex);
+
+	ret = sysfs_create_link(&dev->kobj, &group->kobj, "device_isolation_group");
+	if (0 > ret)
+		printk(KERN_WARNING "device_isolation: create device_isolation_group "
+			"link failed for %s -> %s, errno=%i\n",
+			kobject_name(&dev->kobj), kobject_name(&group->kobj), ret);
+
+	ret = sysfs_create_link(&group->kobj, &dev->kobj, kobject_name(&dev->kobj));
+	if (0 > ret)
+		printk(KERN_WARNING "device_isolation: create "
+			"link failed for %s -> %s, errno=%i\n",
+			kobject_name(&dev->kobj), kobject_name(&group->kobj),
+			ret);
+
+	mutex_unlock(&group->mutex);
+
+	return ret;
+}
+
+int device_isolation_allow_driver_match(struct device_isolation_group *group)
+{
+	int ret;
+	struct device *dev;
+
+	mutex_lock(&group->mutex);
+
+	if (group->allow_driver_match) {
+		/* Nothing to do */
+		ret = 0;
+		goto out;
+	}
+
+	if (group->binder) {
+		ret = -EBUSY;
+		goto out;
+	}
+
+	group->allow_driver_match = true;
+
+	list_for_each_entry(dev, &group->devices, di_list) {
+		printk(KERN_DEBUG "device_isolation: reprobing %s\n",
+				kobject_name(&dev->kobj));
+		ret = device_reprobe(dev);
+		if (ret < 0)
+			printk(KERN_WARNING "device_isolation: Error %d "
+			       "reprobing device %s\n", ret,
+			       kobject_name(&dev->kobj));
+	}
+
+	ret = 0;
+
+out:
+	mutex_unlock(&group->mutex);
+
+	return ret;
+}
+
+int device_isolation_disallow_driver_match(struct device_isolation_group *group)
+{
+	int ret;
+	struct device *dev;
+
+	BUG_ON(!group);
+
+	mutex_lock(&group->mutex);
+
+	if (!group->allow_driver_match) {
+		/* Nothing to do */
+		ret = 0;
+		goto out;
+	}
+
+	BUG_ON(group->binder);
+
+	group->allow_driver_match = false;
+
+	list_for_each_entry(dev, &group->devices, di_list) {
+		printk(KERN_DEBUG "device_isolation: reprobing %s\n",
+				kobject_name(&dev->kobj));
+		/* We need to drop the lock because reprobe can block */
+		device_release_driver(dev);
+	}
+
+out:
+	mutex_unlock(&group->mutex);
+
+	return ret;
+}
+
+static bool permission_check(struct device_isolation_group *group, int mask)
+{
+	unsigned int mode = group->mode;
+
+	mask &= MAY_READ | MAY_WRITE | MAY_EXEC | MAY_NOT_BLOCK;
+
+	if (likely(current_uid() == group->uid))
+		mode >>= 6;
+	else if (in_group_p(group->gid))
+		mode >>= 3;
+
+	if ((mask & ~mode & (MAY_READ | MAY_WRITE | MAY_EXEC)) == 0)
+		return true;
+
+	printk(KERN_DEBUG "device_isolation: permission check failed.\n");
+
+	return false;
+}
+
+int device_isolation_bind(struct device_isolation_group *group,
+			  struct device_isolation_binder *binder,
+			  void *priv)
+{
+	int ret;
+
+	printk(KERN_DEBUG "device_isolation: bind group %s to %s\n",
+	       kobject_name(&group->kobj), binder->name);
+
+	mutex_lock(&group->mutex);
+
+	if (!capable(CAP_SYS_ADMIN) &&
+	    !permission_check(group, MAY_WRITE | MAY_READ)) {
+		ret = -EACCES;
+		goto out;
+	}
+
+	if (group->allow_driver_match) {
+		ret = -EBUSY;
+		goto out;
+	}
+
+	if (group->binder) {
+		ret = -EEXIST;
+		goto out;
+	}
+
+	group->binder = binder;
+	group->binder_priv = priv;
+	ret = 0;
+
+out:
+	mutex_unlock(&group->mutex);
+
+	return ret;
+}
+
+void device_isolation_unbind(struct device_isolation_group *group,
+			     struct device_isolation_binder *binder)
+{
+	printk(KERN_DEBUG "device_isolation: unbind group %s from %s\n",
+	       kobject_name(&group->kobj), binder->name);
+
+	mutex_lock(&group->mutex);
+
+	BUG_ON(group->allow_driver_match);
+	BUG_ON(group->binder != binder);
+
+	group->binder = NULL;
+	group->binder_priv = NULL;
+
+	mutex_unlock(&group->mutex);
+}
+
+int __init device_isolation_init(void)
+{
+	device_isolation_kset = kset_create_and_add("isolation", NULL, NULL);
+	if (!device_isolation_kset)
+		return -ENOMEM;
+	return 0;
+}
diff --git a/drivers/base/init.c b/drivers/base/init.c
index c8a934e..7237e80 100644
--- a/drivers/base/init.c
+++ b/drivers/base/init.c
@@ -8,6 +8,7 @@
 #include <linux/device.h>
 #include <linux/init.h>
 #include <linux/memory.h>
+#include <linux/device_isolation.h>
 
 #include "base.h"
 
@@ -24,6 +25,7 @@ void __init driver_init(void)
 	devices_init();
 	buses_init();
 	classes_init();
+	device_isolation_init();
 	firmware_init();
 	hypervisor_init();
 
diff --git a/include/linux/device.h b/include/linux/device.h
index c20dfbf..e11f5c7 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -585,6 +585,11 @@ struct device {
 
 	struct dma_coherent_mem	*dma_mem; /* internal for coherent mem
 					     override */
+#ifdef CONFIG_DEVICE_ISOLATION
+	struct device_isolation_group *di_group;
+	struct list_head 	di_list;
+#endif
+
 	/* arch specific additions */
 	struct dev_archdata	archdata;
 
diff --git a/include/linux/device_isolation.h b/include/linux/device_isolation.h
new file mode 100644
index 0000000..64af9ac
--- /dev/null
+++ b/include/linux/device_isolation.h
@@ -0,0 +1,124 @@
+#ifndef _DEVICE_ISOLATION_H_
+#define _DEVICE_ISOLATION_H_
+
+#include <linux/kobject.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/device.h>
+
+struct device_isolation_binder {
+	const char *name;
+};
+
+struct device_isolation_group {
+	struct kobject 			kobj;
+	struct list_head 		devices;
+	struct mutex			mutex;
+	bool				allow_driver_match;
+	struct device_isolation_binder	*binder;
+	void 				*binder_priv;
+	uid_t				uid;
+	gid_t				gid;
+	mode_t				mode;
+};
+
+#ifdef CONFIG_DEVICE_ISOLATION
+
+int __init device_isolation_init(void);
+
+int device_isolation_group_init(struct device_isolation_group *group,
+				const char *fmt, ...);
+
+void device_isolation_dev_add(struct device_isolation_group *group,
+			      struct device *dev);
+void device_isolation_dev_remove(struct device *dev);
+int device_isolation_dev_update_sysfs(struct device *dev);
+
+int device_isolation_allow_driver_match(struct device_isolation_group *group);
+int device_isolation_disallow_driver_match(struct device_isolation_group *group);
+
+int device_isolation_bind(struct device_isolation_group *group,
+			  struct device_isolation_binder *binder,
+			  void *priv);
+void device_isolation_unbind(struct device_isolation_group *group,
+			     struct device_isolation_binder *binder);
+
+#else /* CONFIG_DEVICE_ISOLATION */
+
+static inline int __init device_isolation_init(void)
+{
+	return 0;
+}
+
+static inline
+int device_isolation_group_init(struct device_isolation_group *group,
+				const char *fmt, ...)
+{
+	return 0;
+}
+
+static inline
+struct isolation_group *device_isolation_group_new(const char *name)
+{
+	return NULL;
+}
+
+static inline
+void device_isolation_dev_add(struct device_isolation_group *group,
+			      struct device *dev)
+{
+}
+
+static inline
+void device_isolation_dev_remove(struct device *dev)
+{
+}
+
+static inline int device_isolation_dev_update_sysfs(struct device *dev)
+{
+	return 0;
+}
+
+static inline
+int device_isolation_bind(struct device_isolation_group *group,
+			  struct device_isolation_binder *binder,
+			  void *priv)
+{
+	return -ENOSYS;
+}
+
+static inline
+void device_isolation_unbind(struct device_isolation_group *group,
+			     struct device_isolation_binder *binder)
+{
+	BUG();
+}
+
+#endif /* CONFIG_DEVICE_ISOLATION */
+
+static inline
+struct device_isolation_group *device_isolation_group(struct device *dev)
+{
+#ifdef CONFIG_DEVICE_ISOLATION
+	return dev->di_group;
+#else /* CONFIG_DEVICE_ISOLATION */
+	return NULL;
+#endif /* CONFIG_DEVICE_ISOLATION */
+}
+
+static inline bool device_isolation_driver_match_allowed(struct device *dev)
+{
+	struct device_isolation_group *group =
+		device_isolation_group(dev);
+	int ret = true;
+
+	if (group) {
+		mutex_lock(&group->mutex);
+		ret = group->allow_driver_match;
+		mutex_unlock(&group->mutex);
+	}
+
+	return ret;
+}
+
+#endif /* _DEVICE_ISOLATION_H_ */
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/3] device_isolation: Support isolation on POWER p5ioc2 bridges
  2011-12-15  6:25 [RFC] Device isolation infrastructure v2 David Gibson
  2011-12-15  6:25 ` [PATCH 1/3] device_isolation: Infrastructure for managing device isolation groups David Gibson
@ 2011-12-15  6:25 ` David Gibson
  2011-12-15  6:25 ` [PATCH 3/3] device_isolation: Support isolation on POWER p7ioc (IODA) bridges David Gibson
  2011-12-15 18:05 ` [RFC] Device isolation infrastructure v2 Alex Williamson
  3 siblings, 0 replies; 23+ messages in thread
From: David Gibson @ 2011-12-15  6:25 UTC (permalink / raw)
  To: alex.williamson
  Cc: aik, benh, joerg.roedel, dwmw2, chrisw, agraf, scottwood, B08248,
	rusty, iommu, qemu-devel, linux-kernel

This patch adds code to the code for the powernv platform to create
and populate isolation groups on hardware using the p5ioc2 PCI host
bridge used on some IBM POWER systems.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 arch/powerpc/platforms/powernv/pci-p5ioc2.c |   14 +++++++++++++-
 arch/powerpc/platforms/powernv/pci.h        |    3 +++
 2 files changed, 16 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-p5ioc2.c b/arch/powerpc/platforms/powernv/pci-p5ioc2.c
index 4c80f7c..24847a9 100644
--- a/arch/powerpc/platforms/powernv/pci-p5ioc2.c
+++ b/arch/powerpc/platforms/powernv/pci-p5ioc2.c
@@ -20,6 +20,7 @@
 #include <linux/irq.h>
 #include <linux/io.h>
 #include <linux/msi.h>
+#include <linux/device_isolation.h>
 
 #include <asm/sections.h>
 #include <asm/io.h>
@@ -88,10 +89,21 @@ static void pnv_pci_init_p5ioc2_msis(struct pnv_phb *phb) { }
 static void __devinit pnv_pci_p5ioc2_dma_dev_setup(struct pnv_phb *phb,
 						   struct pci_dev *pdev)
 {
-	if (phb->p5ioc2.iommu_table.it_map == NULL)
+	if (phb->p5ioc2.iommu_table.it_map == NULL) {
 		iommu_init_table(&phb->p5ioc2.iommu_table, phb->hose->node);
+#ifdef CONFIG_DEVICE_ISOLATION
+		phb->p5ioc2.di_group = kzalloc(sizeof(*(phb->p5ioc2.di_group)),
+					       GFP_KERNEL);
+		BUG_ON(!phb->p5ioc2.di_group ||
+		       (device_isolation_group_init(phb->p5ioc2.di_group,
+						    "p5ioc2:%llx", phb->opal_id) < 0));
+#endif
+	}
 
 	set_iommu_table_base(&pdev->dev, &phb->p5ioc2.iommu_table);
+#ifdef CONFIG_DEVICE_ISOLATION
+	device_isolation_dev_add(phb->p5ioc2.di_group, &pdev->dev);
+#endif
 }
 
 static void __init pnv_pci_init_p5ioc2_phb(struct device_node *np,
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 28ae4ca..a2dc071 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -77,6 +77,9 @@ struct pnv_phb {
 	union {
 		struct {
 			struct iommu_table iommu_table;
+#ifdef CONFIG_DEVICE_ISOLATION
+			struct device_isolation_group *di_group;
+#endif
 		} p5ioc2;
 
 		struct {
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/3] device_isolation: Support isolation on POWER p7ioc (IODA) bridges
  2011-12-15  6:25 [RFC] Device isolation infrastructure v2 David Gibson
  2011-12-15  6:25 ` [PATCH 1/3] device_isolation: Infrastructure for managing device isolation groups David Gibson
  2011-12-15  6:25 ` [PATCH 2/3] device_isolation: Support isolation on POWER p5ioc2 bridges David Gibson
@ 2011-12-15  6:25 ` David Gibson
  2011-12-15 18:05 ` [RFC] Device isolation infrastructure v2 Alex Williamson
  3 siblings, 0 replies; 23+ messages in thread
From: David Gibson @ 2011-12-15  6:25 UTC (permalink / raw)
  To: alex.williamson
  Cc: aik, benh, joerg.roedel, dwmw2, chrisw, agraf, scottwood, B08248,
	rusty, iommu, qemu-devel, linux-kernel

This patch adds code to the code for the powernv platform to create
and populate isolation groups on hardware using the p7ioc (aka IODA) PCI host
bridge used on some IBM POWER systems.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 arch/powerpc/platforms/powernv/pci-ioda.c |   18 ++++++++++++++++--
 arch/powerpc/platforms/powernv/pci.h      |    6 ++++++
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 0cdc8302..6df632e 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -20,6 +20,7 @@
 #include <linux/irq.h>
 #include <linux/io.h>
 #include <linux/msi.h>
+#include <linux/device_isolation.h>
 
 #include <asm/sections.h>
 #include <asm/io.h>
@@ -861,6 +862,9 @@ static void __devinit pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe,
 		set_iommu_table_base(&dev->dev, &pe->tce32_table);
 		if (dev->subordinate)
 			pnv_ioda_setup_bus_dma(pe, dev->subordinate);
+#ifdef CONFIG_DEVICE_ISOLATION
+		device_isolation_dev_add(&pe->di_group, &dev->dev);
+#endif
 	}
 }
 
@@ -941,11 +945,21 @@ static void __devinit pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 	}
 	iommu_init_table(tbl, phb->hose->node);
 
-	if (pe->pdev)
+#ifdef CONFIG_DEVICE_ISOLATION
+	BUG_ON(device_isolation_group_init(&pe->di_group, "ioda:rid%x-pe%x",
+					   pe->rid, pe->pe_number) < 0);
+#endif
+
+	if (pe->pdev) {
 		set_iommu_table_base(&pe->pdev->dev, tbl);
-	else
+#ifdef CONFIG_DEVICE_ISOLATION
+		device_isolation_dev_add(&pe->di_group, &pe->pdev->dev);
+#endif
+	} else
 		pnv_ioda_setup_bus_dma(pe, pe->pbus);
 
+
+
 	return;
  fail:
 	/* XXX Failure: Try to fallback to 64-bit only ? */
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index a2dc071..d663a26 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -1,6 +1,8 @@
 #ifndef __POWERNV_PCI_H
 #define __POWERNV_PCI_H
 
+#include <linux/device_isolation.h>
+
 struct pci_dn;
 
 enum pnv_phb_type {
@@ -51,6 +53,10 @@ struct pnv_ioda_pe {
 
 	/* Link in list of PE#s */
 	struct list_head	link;
+
+#ifdef CONFIG_DEVICE_ISOLATION
+	struct device_isolation_group di_group;
+#endif
 };
 
 struct pnv_phb {
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RFC] Device isolation infrastructure v2
  2011-12-15  6:25 [RFC] Device isolation infrastructure v2 David Gibson
                   ` (2 preceding siblings ...)
  2011-12-15  6:25 ` [PATCH 3/3] device_isolation: Support isolation on POWER p7ioc (IODA) bridges David Gibson
@ 2011-12-15 18:05 ` Alex Williamson
  2011-12-15 22:39   ` Alex Williamson
                     ` (2 more replies)
  3 siblings, 3 replies; 23+ messages in thread
From: Alex Williamson @ 2011-12-15 18:05 UTC (permalink / raw)
  To: David Gibson
  Cc: aik, benh, joerg.roedel, dwmw2, chrisw, agraf, scottwood, B08248,
	rusty, iommu, qemu-devel, linux-kernel

On Thu, 2011-12-15 at 17:25 +1100, David Gibson wrote:
> Here's the second spin of my preferred approach to handling grouping
> of devices for safe assignment to guests.
> 
> Changes since v1:
>  * Many name changes and file moves for improved consistency
>  * Bugfixes and cleanups
>  * The interface to the next layer up is considerably fleshed out,
>    although it still needs work.
>  * Example initialization of groups for p5ioc2 and p7ioc.
> 
> TODO:
>  * Need sample initialization of groups for intel and/or amd iommus

I think this very well might imposed significant bloat for those
implementations.  On POWER you typically don't have singleton groups,
while it's the norm on x86.  I don't know that either intel or amd iommu
code have existing structures that they can simply tack the group
pointer to.  Again, this is one of the reasons that I think the current
vfio implementation is the right starting point.  We keep groups within
vfio, imposing zero overhead for systems not making use of it and only
require iommu drivers to implement a trivial function to opt-in.  As we
start to make groups more pervasive in the dma layer, independent of
userspace driver exposure, we can offload pieces to the core.  Starting
with it in the core and hand waving some future use that we don't plan
to implement right now seems like the wrong direction.

>  * Use of sysfs attributes to control group permission is probably a
>    mistake.  Although it seems a bit odd, registering a chardev for
>    each group is probably better, because perms can be set from udev
>    rules, just like everything else.

I agree, this is a horrible mistake.  Reinventing file permissions via
sysfs is bound to be broken and doesn't account for selinux permissions.
Again, I know you don't like aspects of the vfio group management, but
it gets this right imho.

>  * Need more details of what the binder structure will need to
>    contain.
>  * Handle complete removal of groups.
>  * Clarify what will need to happen on the hot unplug path.

We're still also removing devices from the driver model, this means
drivers like vfio need to re-implement a lot of the driver core for
driving each individual device in the group, and as you indicate, it's
unclear what happens in the hotplug path and I wonder if things like
suspend/resume will also require non-standard support.  I really prefer
attaching individual bus drivers to devices using the standard
bind/unbind mechanisms.  I have a hard time seeing how this is an
improvement from vfio.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] Device isolation infrastructure v2
  2011-12-15 18:05 ` [RFC] Device isolation infrastructure v2 Alex Williamson
@ 2011-12-15 22:39   ` Alex Williamson
  2011-12-16  1:40   ` David Gibson
  2011-12-16 14:53   ` Joerg Roedel
  2 siblings, 0 replies; 23+ messages in thread
From: Alex Williamson @ 2011-12-15 22:39 UTC (permalink / raw)
  To: David Gibson
  Cc: aik, benh, joerg.roedel, dwmw2, chrisw, agraf, scottwood, B08248,
	rusty, iommu, qemu-devel, linux-kernel

On Thu, 2011-12-15 at 11:05 -0700, Alex Williamson wrote:
> On Thu, 2011-12-15 at 17:25 +1100, David Gibson wrote:
> > Here's the second spin of my preferred approach to handling grouping
> > of devices for safe assignment to guests.
> > 
> > Changes since v1:
> >  * Many name changes and file moves for improved consistency
> >  * Bugfixes and cleanups
> >  * The interface to the next layer up is considerably fleshed out,
> >    although it still needs work.
> >  * Example initialization of groups for p5ioc2 and p7ioc.
> > 
> > TODO:
> >  * Need sample initialization of groups for intel and/or amd iommus
> 
> I think this very well might imposed significant bloat for those
> implementations.  On POWER you typically don't have singleton groups,
> while it's the norm on x86.  I don't know that either intel or amd iommu
> code have existing structures that they can simply tack the group
> pointer to.  Again, this is one of the reasons that I think the current
> vfio implementation is the right starting point.  We keep groups within
> vfio, imposing zero overhead for systems not making use of it and only
> require iommu drivers to implement a trivial function to opt-in.  As we
> start to make groups more pervasive in the dma layer, independent of
> userspace driver exposure, we can offload pieces to the core.  Starting
> with it in the core and hand waving some future use that we don't plan
> to implement right now seems like the wrong direction.
> 
> >  * Use of sysfs attributes to control group permission is probably a
> >    mistake.  Although it seems a bit odd, registering a chardev for
> >    each group is probably better, because perms can be set from udev
> >    rules, just like everything else.
> 
> I agree, this is a horrible mistake.  Reinventing file permissions via
> sysfs is bound to be broken and doesn't account for selinux permissions.
> Again, I know you don't like aspects of the vfio group management, but
> it gets this right imho.
> 
> >  * Need more details of what the binder structure will need to
> >    contain.
> >  * Handle complete removal of groups.
> >  * Clarify what will need to happen on the hot unplug path.
> 
> We're still also removing devices from the driver model, this means
> drivers like vfio need to re-implement a lot of the driver core for
> driving each individual device in the group, and as you indicate, it's
> unclear what happens in the hotplug path and I wonder if things like
> suspend/resume will also require non-standard support.  I really prefer
> attaching individual bus drivers to devices using the standard
> bind/unbind mechanisms.  I have a hard time seeing how this is an
> improvement from vfio.  Thanks,

I should also mention that I just pushed a new version of the vfio
series out to github, it can be found here:

git://github.com/awilliam/linux-vfio.git vfio-next-20111215

This fixes many bugs, including PCI config space access sizes and the
todo item of actually preventing user access to the MSI-X table area.  I
think the vfio-pci driver is much closer to being read to submit now.
It think I've addressed all the previous review comments.  Still pending
is a documentation refresh and some decision about what and how we
expose iommu information.

As usual, the matching qemu tree is here:

git://github.com/awilliam/qemu-vfio.git vfio-ng

I've tested this with an Intel 82576 (both PF and VF), Broadcom BCM5755,
Intel HD Audio controller, and legacy PCI SB Live.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] Device isolation infrastructure v2
  2011-12-15 18:05 ` [RFC] Device isolation infrastructure v2 Alex Williamson
  2011-12-15 22:39   ` Alex Williamson
@ 2011-12-16  1:40   ` David Gibson
  2011-12-16  4:49     ` Alex Williamson
  2011-12-16 14:53   ` Joerg Roedel
  2 siblings, 1 reply; 23+ messages in thread
From: David Gibson @ 2011-12-16  1:40 UTC (permalink / raw)
  To: Alex Williamson
  Cc: aik, benh, joerg.roedel, dwmw2, chrisw, agraf, scottwood, B08248,
	rusty, iommu, qemu-devel, linux-kernel

On Thu, Dec 15, 2011 at 11:05:07AM -0700, Alex Williamson wrote:
> On Thu, 2011-12-15 at 17:25 +1100, David Gibson wrote:
> > Here's the second spin of my preferred approach to handling grouping
> > of devices for safe assignment to guests.
> > 
> > Changes since v1:
> >  * Many name changes and file moves for improved consistency
> >  * Bugfixes and cleanups
> >  * The interface to the next layer up is considerably fleshed out,
> >    although it still needs work.
> >  * Example initialization of groups for p5ioc2 and p7ioc.
> > 
> > TODO:
> >  * Need sample initialization of groups for intel and/or amd iommus
> 
> I think this very well might imposed significant bloat for those
> implementations.  On POWER you typically don't have singleton groups,
> while it's the norm on x86.  I don't know that either intel or amd iommu
> code have existing structures that they can simply tack the group
> pointer to.

Actually, I think they can probably just use the group pointer in the
struct device.  Each PCI function will typically allocate a new group
and put the pointer in the struct device and no-where else.  Devices
hidden under bridges copy the pointer from the bridge parent instead.
I will have to check the unplug path to ensure we can manage the group
lifetime properly, of course.

>  Again, this is one of the reasons that I think the current
> vfio implementation is the right starting point.  We keep groups within
> vfio, imposing zero overhead for systems not making use of it and only
> require iommu drivers to implement a trivial function to opt-in.  As we
> start to make groups more pervasive in the dma layer, independent of
> userspace driver exposure, we can offload pieces to the core.  Starting
> with it in the core and hand waving some future use that we don't plan
> to implement right now seems like the wrong direction.

Well, I think we must agree to disagree here; I think treating groups
as identifiable objects is worthwhile.  That said, I am looking for
ways to whittle down the overhead when they're not in use.

> >  * Use of sysfs attributes to control group permission is probably a
> >    mistake.  Although it seems a bit odd, registering a chardev for
> >    each group is probably better, because perms can be set from udev
> >    rules, just like everything else.
> 
> I agree, this is a horrible mistake.  Reinventing file permissions via
> sysfs is bound to be broken and doesn't account for selinux permissions.
> Again, I know you don't like aspects of the vfio group management, but
> it gets this right imho.

Yeah.  I came up with this because I was trying to avoid registering a
device whose only purpose was to act as a permissioned "handle" on the
group.  But it is a better approach, despite that.  I just wanted to
send out the new patches for comment without waiting to do that
rework.

> >  * Need more details of what the binder structure will need to
> >    contain.
> >  * Handle complete removal of groups.
> >  * Clarify what will need to happen on the hot unplug path.
> 
> We're still also removing devices from the driver model, this means
> drivers like vfio need to re-implement a lot of the driver core for
> driving each individual device in the group,

Ah, so, that's a bit I've yet to flesh out.  The subtle distinction is
that we prevent driver _matching_ not driver _binding.  It's
intentionally still possible to explicitly bind drivers to devices in
the group, by bypassing the automatic match mechanism.

I'm intending that when a group is bound, the binder struct will
(optionally) specify a driver (or several, for different bus types)
which will be "force bound" to all the devices in the group.

> and as you indicate, it's
> unclear what happens in the hotplug path

It's clear enough in concept.  I just have to work out exactly where
we need to call d_i_dev_remove(), and whether we'll need some sort of
callback to the bridge / iommu code.

> and I wonder if things like
> suspend/resume will also require non-standard support.  I really prefer
> attaching individual bus drivers to devices using the standard
> bind/unbind mechanisms.  I have a hard time seeing how this is an
> improvement from vfio.  Thanks,
> 
> Alex
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] Device isolation infrastructure v2
  2011-12-16  1:40   ` David Gibson
@ 2011-12-16  4:49     ` Alex Williamson
  2011-12-16  6:00       ` David Gibson
  0 siblings, 1 reply; 23+ messages in thread
From: Alex Williamson @ 2011-12-16  4:49 UTC (permalink / raw)
  To: David Gibson
  Cc: aik, benh, joerg.roedel, dwmw2, chrisw, agraf, scottwood, B08248,
	rusty, iommu, qemu-devel, linux-kernel

On Fri, 2011-12-16 at 12:40 +1100, David Gibson wrote:
> On Thu, Dec 15, 2011 at 11:05:07AM -0700, Alex Williamson wrote:
> > On Thu, 2011-12-15 at 17:25 +1100, David Gibson wrote:
> > > Here's the second spin of my preferred approach to handling grouping
> > > of devices for safe assignment to guests.
> > > 
> > > Changes since v1:
> > >  * Many name changes and file moves for improved consistency
> > >  * Bugfixes and cleanups
> > >  * The interface to the next layer up is considerably fleshed out,
> > >    although it still needs work.
> > >  * Example initialization of groups for p5ioc2 and p7ioc.
> > > 
> > > TODO:
> > >  * Need sample initialization of groups for intel and/or amd iommus
> > 
> > I think this very well might imposed significant bloat for those
> > implementations.  On POWER you typically don't have singleton groups,
> > while it's the norm on x86.  I don't know that either intel or amd iommu
> > code have existing structures that they can simply tack the group
> > pointer to.
> 
> Actually, I think they can probably just use the group pointer in the
> struct device.  Each PCI function will typically allocate a new group
> and put the pointer in the struct device and no-where else.  Devices
> hidden under bridges copy the pointer from the bridge parent instead.
> I will have to check the unplug path to ensure we can manage the group
> lifetime properly, of course.
> 
> >  Again, this is one of the reasons that I think the current
> > vfio implementation is the right starting point.  We keep groups within
> > vfio, imposing zero overhead for systems not making use of it and only
> > require iommu drivers to implement a trivial function to opt-in.  As we
> > start to make groups more pervasive in the dma layer, independent of
> > userspace driver exposure, we can offload pieces to the core.  Starting
> > with it in the core and hand waving some future use that we don't plan
> > to implement right now seems like the wrong direction.
> 
> Well, I think we must agree to disagree here; I think treating groups
> as identifiable objects is worthwhile.  That said, I am looking for
> ways to whittle down the overhead when they're not in use.
> 
> > >  * Use of sysfs attributes to control group permission is probably a
> > >    mistake.  Although it seems a bit odd, registering a chardev for
> > >    each group is probably better, because perms can be set from udev
> > >    rules, just like everything else.
> > 
> > I agree, this is a horrible mistake.  Reinventing file permissions via
> > sysfs is bound to be broken and doesn't account for selinux permissions.
> > Again, I know you don't like aspects of the vfio group management, but
> > it gets this right imho.
> 
> Yeah.  I came up with this because I was trying to avoid registering a
> device whose only purpose was to act as a permissioned "handle" on the
> group.  But it is a better approach, despite that.  I just wanted to
> send out the new patches for comment without waiting to do that
> rework.

So we end up with a chardev created by the core, whose only purpose is
setting the group access permissions for userspace usage, which only
becomes useful with something like vfio?  It's an odd conflict that
isolation groups would get involved with userspace permissions to access
the group, but leave enforcement of isolation via iommu groups to the
"binder" driver (where it seems like vfio is still going to need some
kind of merge interface to share a domain between isolation groups).

Is this same chardev going to be a generic conduit for
read/write/mmap/ioctl to the "binder" driver or does vfio need to create
it's own chardev for that?  In the former case, are we ok with a chardev
that has an entirely modular API behind it, or maybe you're planning to
define some basic API infrastructure, in which case this starts smelling
like implementing vfio in the core.  In the latter case (isolation
chardev + vfio chardev) coordinating permissions sounds like a mess.

Alex


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] Device isolation infrastructure v2
  2011-12-16  4:49     ` Alex Williamson
@ 2011-12-16  6:00       ` David Gibson
  0 siblings, 0 replies; 23+ messages in thread
From: David Gibson @ 2011-12-16  6:00 UTC (permalink / raw)
  To: Alex Williamson
  Cc: aik, benh, joerg.roedel, dwmw2, chrisw, agraf, scottwood, B08248,
	rusty, iommu, qemu-devel, linux-kernel

On Thu, Dec 15, 2011 at 09:49:05PM -0700, Alex Williamson wrote:
> On Fri, 2011-12-16 at 12:40 +1100, David Gibson wrote:
> > On Thu, Dec 15, 2011 at 11:05:07AM -0700, Alex Williamson wrote:
> > > On Thu, 2011-12-15 at 17:25 +1100, David Gibson wrote:
> > > > Here's the second spin of my preferred approach to handling grouping
> > > > of devices for safe assignment to guests.
> > > > 
> > > > Changes since v1:
> > > >  * Many name changes and file moves for improved consistency
> > > >  * Bugfixes and cleanups
> > > >  * The interface to the next layer up is considerably fleshed out,
> > > >    although it still needs work.
> > > >  * Example initialization of groups for p5ioc2 and p7ioc.
> > > > 
> > > > TODO:
> > > >  * Need sample initialization of groups for intel and/or amd iommus
> > > 
> > > I think this very well might imposed significant bloat for those
> > > implementations.  On POWER you typically don't have singleton groups,
> > > while it's the norm on x86.  I don't know that either intel or amd iommu
> > > code have existing structures that they can simply tack the group
> > > pointer to.
> > 
> > Actually, I think they can probably just use the group pointer in the
> > struct device.  Each PCI function will typically allocate a new group
> > and put the pointer in the struct device and no-where else.  Devices
> > hidden under bridges copy the pointer from the bridge parent instead.
> > I will have to check the unplug path to ensure we can manage the group
> > lifetime properly, of course.
> > 
> > >  Again, this is one of the reasons that I think the current
> > > vfio implementation is the right starting point.  We keep groups within
> > > vfio, imposing zero overhead for systems not making use of it and only
> > > require iommu drivers to implement a trivial function to opt-in.  As we
> > > start to make groups more pervasive in the dma layer, independent of
> > > userspace driver exposure, we can offload pieces to the core.  Starting
> > > with it in the core and hand waving some future use that we don't plan
> > > to implement right now seems like the wrong direction.
> > 
> > Well, I think we must agree to disagree here; I think treating groups
> > as identifiable objects is worthwhile.  That said, I am looking for
> > ways to whittle down the overhead when they're not in use.
> > 
> > > >  * Use of sysfs attributes to control group permission is probably a
> > > >    mistake.  Although it seems a bit odd, registering a chardev for
> > > >    each group is probably better, because perms can be set from udev
> > > >    rules, just like everything else.
> > > 
> > > I agree, this is a horrible mistake.  Reinventing file permissions via
> > > sysfs is bound to be broken and doesn't account for selinux permissions.
> > > Again, I know you don't like aspects of the vfio group management, but
> > > it gets this right imho.
> > 
> > Yeah.  I came up with this because I was trying to avoid registering a
> > device whose only purpose was to act as a permissioned "handle" on the
> > group.  But it is a better approach, despite that.  I just wanted to
> > send out the new patches for comment without waiting to do that
> > rework.
> 
> So we end up with a chardev created by the core, whose only purpose is
> setting the group access permissions for userspace usage, which only
> becomes useful with something like vfio?  It's an odd conflict that
> isolation groups would get involved with userspace permissions to access
> the group, but leave enforcement of isolation via iommu groups to the
> "binder" driver

Hm, perhaps.  I'll think about it.

> (where it seems like vfio is still going to need some
> kind of merge interface to share a domain between isolation groups).

That was always going to be the case, but I wish we could stop
thinking of it as the "merge" interface, since I think that term is
distorting thinking about how the interface works.

For example, I think opening a new domain, then adding / removing
groups provides a much cleaner model than "merging' groups without a
separate handle on the iommu domain we're building.

> Is this same chardev going to be a generic conduit for
> read/write/mmap/ioctl to the "binder" driver or does vfio need to create
> it's own chardev for that?

Right, I was thinking that the binder could supply its own fops or
something for the group chardev once the group is bound.

>  In the former case, are we ok with a chardev
> that has an entirely modular API behind it, or maybe you're planning to
> define some basic API infrastructure, in which case this starts smelling
> like implementing vfio in the core.

I think it can work, but I do need to look closer.

>  In the latter case (isolation
> chardev + vfio chardev) coordinating permissions sounds like a mess.

Absolutely.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] Device isolation infrastructure v2
  2011-12-15 18:05 ` [RFC] Device isolation infrastructure v2 Alex Williamson
  2011-12-15 22:39   ` Alex Williamson
  2011-12-16  1:40   ` David Gibson
@ 2011-12-16 14:53   ` Joerg Roedel
  2011-12-19  0:11     ` David Gibson
  2 siblings, 1 reply; 23+ messages in thread
From: Joerg Roedel @ 2011-12-16 14:53 UTC (permalink / raw)
  To: Alex Williamson
  Cc: David Gibson, aik, benh, dwmw2, chrisw, agraf, scottwood, B08248,
	rusty, iommu, qemu-devel, linux-kernel, joro

On Thu, Dec 15, 2011 at 11:05:07AM -0700, Alex Williamson wrote:
> Starting with it in the core and hand waving some future use that we
> don't plan to implement right now seems like the wrong direction.

I agree with Alex. First of all, I havn't seen any real vfio problem
that can't be solved with the current approach, and it has the great
advantage of simplicity. It doesn't require a re-implementation of the
driver-core based on groups. I agree that we need some improvements to
Alex' code for the dma-api layer to solve the problem with broken devices
using the wrong requestor-id. But that can be done incrementally with
the current (current == in the iommu-tree) approach implemented by Alex.

I also think that all this does not belong into the driver core for two
reasons:

	1) The information for building the device groups is provided
	   by the iommu-layer
	2) The group information is provided to vfio by the iommu-api

This makes the iommu-layer the logical point to place the grouping code.
There are some sources outside of the iommu-layer that may influence
grouping (like pci-quirks), but most of the job is done by the
iommu-drivers.

Thanks,

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] Device isolation infrastructure v2
  2011-12-16 14:53   ` Joerg Roedel
@ 2011-12-19  0:11     ` David Gibson
  2011-12-19 15:41       ` Joerg Roedel
  2011-12-19 15:46       ` David Woodhouse
  0 siblings, 2 replies; 23+ messages in thread
From: David Gibson @ 2011-12-19  0:11 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Alex Williamson, aik, benh, dwmw2, chrisw, agraf, scottwood,
	B08248, rusty, iommu, qemu-devel, linux-kernel, joro

On Fri, Dec 16, 2011 at 03:53:53PM +0100, Joerg Roedel wrote:
> On Thu, Dec 15, 2011 at 11:05:07AM -0700, Alex Williamson wrote:
> > Starting with it in the core and hand waving some future use that we
> > don't plan to implement right now seems like the wrong direction.
> 
> I agree with Alex. First of all, I havn't seen any real vfio problem
> that can't be solved with the current approach, and it has the great
> advantage of simplicity. It doesn't require a re-implementation of the
> driver-core based on groups.

I'm not re-implementing the driver core in terms of groups, just
adding the concept of groups to the driver core.

> I agree that we need some improvements to
> Alex' code for the dma-api layer to solve the problem with broken devices
> using the wrong requestor-id. But that can be done incrementally with
> the current (current == in the iommu-tree) approach implemented by Alex.
> 
> I also think that all this does not belong into the driver core for two
> reasons:
> 
> 	1) The information for building the device groups is provided
> 	   by the iommu-layer

Yes.. no change there.

> 	2) The group information is provided to vfio by the iommu-api

Um.. huh?  Currently, the iommu-api supplies the info vfio, therefore
it should?  I assume there's a non-circular argument you're trying to
make here, but I can't figure out what it is.

> This makes the iommu-layer the logical point to place the grouping
> code.

Well.. that's not where it is in Alex's code either.  The iommu layer
(to the extent that there is such a "layer") supplies the group info,
but the group management is in vfio, not the iommu layer.  With mine
it is in the driver core because the struct device seemed the logical
place for the group id.

Moving the group management into the iommu code itself probably does
make more sense, although I think that would be a change more of code
location than any actual content change.

> There are some sources outside of the iommu-layer that may influence
> grouping (like pci-quirks), but most of the job is done by the
> iommu-drivers.

Right, so, the other problem is that a well boundaried "iommu-driver'
is something that only exists on x86 at present, and the "iommu api"
is riddled with x86-centric thinking.  Or more accurately, design
based on how the current intel and amd iommus work.  On systems like
POWER, use of the iommu is not optional - it's built into the PCI host
bridge and must be initialized when the bridge is probed, much earlier
than iommu driver initialization on x86.  They have no inbuilt concept
of domains (though we could fake in software in some circumstances).

Now, that is something that needs to be fixed longer term.  I'm just
not sure how to deal with that and sorting out some sort of device
isolation / passthrough system.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] Device isolation infrastructure v2
  2011-12-19  0:11     ` David Gibson
@ 2011-12-19 15:41       ` Joerg Roedel
  2011-12-21  3:32         ` [Qemu-devel] " David Gibson
  2011-12-19 15:46       ` David Woodhouse
  1 sibling, 1 reply; 23+ messages in thread
From: Joerg Roedel @ 2011-12-19 15:41 UTC (permalink / raw)
  To: Alex Williamson, aik, benh, dwmw2, chrisw, agraf, scottwood,
	B08248, rusty, iommu, qemu-devel, linux-kernel, joro

On Mon, Dec 19, 2011 at 11:11:25AM +1100, David Gibson wrote:
> Well.. that's not where it is in Alex's code either.  The iommu layer
> (to the extent that there is such a "layer") supplies the group info,
> but the group management is in vfio, not the iommu layer.  With mine
> it is in the driver core because the struct device seemed the logical
> place for the group id.

Okay, seems we have different ideas of what the 'grouping code' is. I
talked about the group enumeration code only. But group handling code is
certainly important to some degree too. But before we argue about the
right place of the code we should agree on the semantics such code
should provide.

For me it is fine when the code is in VFIO for now, since VFIO is the
only user at the moment. When more users pop up we can easily move it
out to somewhere else. But the semantics influence the interface to
user-space too, so it is more important now. It splits up into a number
of sub problems:

	1) How userspace detects the device<->group relationship?
	2) Do we want group-binding/unbinding to device drivers?
	3) Group attach/detach to iommu-domains?
	4) What to do with hot-added devices?

For 1) I think the current solution with the iommu_group file is fine.
It is somewhat expensive for user-space to figure out the per-group
device-sets, but that is a one-time effort so it doesn't really matter.
Probably we can rename 'iommu_group' to 'isolation_group' or something.

Regarding 2), I think providing user-space a way to unbind groups of
devices from their drivers is a horrible idea. It makes it too easy for
the user to shoot himself in the foot. For example when the user wants
to assign a network card to a guest, but that card is in the same group
as the GPU and the screen wents blank when the guest is started.
Requiring devices to be unbound one-by-one is better because this way
the user always needs to know what he is doing.

For the remaining two questions I think the concept of a default-domain
is helpful.  The default-domain is a per-group domain which is created
by the iommu-driver at initialization time. It is the domain each device
is assigned to when it is not assigned to any other domain (which means
that each device/group is always attached to a domain). The default
domain will be used by the DMA-API layer. This implicitly means, that a
device which is not in the default-domain can't be used with the
dma-api. The dma_supported() function will return false for those
devices.

So what does this mean for point 3? I think we can implement attaching
and detaching groups in the iommu-api. This interface is not exposed to
userspace and can help VFIO and possible future users. Semantic is, that
domain_attach_group() only works when all devices in the group are in
their default domain and domain_detach_group() puts them back into the
default domain.

Question 4) is also solved with the default-domain concept. A hot-added
device is put in the domain of its group automatically. If the group is
owned by VFIO and another driver attaches to the device dma_supported
will return false and initialization will fail.

> Right, so, the other problem is that a well boundaried "iommu-driver'
> is something that only exists on x86 at present, and the "iommu api"
> is riddled with x86-centric thinking.  Or more accurately, design
> based on how the current intel and amd iommus work.  On systems like
> POWER, use of the iommu is not optional - it's built into the PCI host
> bridge and must be initialized when the bridge is probed, much earlier
> than iommu driver initialization on x86.  They have no inbuilt concept
> of domains (though we could fake in software in some circumstances).

Well, the iommu-api was designed for amd-vi and vt-d. But its concepts
turn out to be more general and by no way x86-centric anymore. We
support a couple of ARM platforms too for example. More to come. With
small extensions to the API we will also support GART-like IOMMUs in the
future.
For your hardware the domain-concept will work too. In terms of the
iommu-api a domain is nothing more than an address space. As far as I
understand there is a 1-1 mapping between a hardware iommu and a domain
in your case. The easiest solution then is to share the datastructures
which describe the address space to the hardware between all iommus in a
particular domain.


Regards,

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] Device isolation infrastructure v2
  2011-12-19  0:11     ` David Gibson
  2011-12-19 15:41       ` Joerg Roedel
@ 2011-12-19 15:46       ` David Woodhouse
  2011-12-19 22:31         ` David Gibson
  1 sibling, 1 reply; 23+ messages in thread
From: David Woodhouse @ 2011-12-19 15:46 UTC (permalink / raw)
  To: David Gibson
  Cc: Joerg Roedel, Alex Williamson, aik, benh, chrisw, agraf,
	scottwood, B08248, rusty, iommu, qemu-devel, linux-kernel, joro

[-- Attachment #1: Type: text/plain, Size: 721 bytes --]

On Mon, 2011-12-19 at 11:11 +1100, David Gibson wrote:
>   They have no inbuilt concept
> of domains (though we could fake in software in some circumstances).

That sentence doesn't make much sense to me.

Either you're saying that every device behind a given IOMMU is in *one*
domain (i.e. there's one domain per PCI host bridge), or you're saying
that each device has its *own* domain (maximum isolation, but still
perhaps not really true if you end up with PCIe-to-PCI bridges or broken
hardware such as the ones we've been discovering, where multifunction
devices do their DMA from the wrong function).

Either way, you *do* have domains. You just might not have thought about
it before.

-- 
dwmw2

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5818 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] Device isolation infrastructure v2
  2011-12-19 15:46       ` David Woodhouse
@ 2011-12-19 22:31         ` David Gibson
  2011-12-19 22:56           ` David Woodhouse
  0 siblings, 1 reply; 23+ messages in thread
From: David Gibson @ 2011-12-19 22:31 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Joerg Roedel, Alex Williamson, aik, benh, chrisw, agraf,
	scottwood, B08248, rusty, iommu, qemu-devel, linux-kernel, joro

On Mon, Dec 19, 2011 at 03:46:38PM +0000, David Woodhouse wrote:
> On Mon, 2011-12-19 at 11:11 +1100, David Gibson wrote:
> >   They have no inbuilt concept
> > of domains (though we could fake in software in some circumstances).
> 
> That sentence doesn't make much sense to me.
> 
> Either you're saying that every device behind a given IOMMU is in *one*
> domain (i.e. there's one domain per PCI host bridge), or you're saying
> that each device has its *own* domain (maximum isolation, but still
> perhaps not really true if you end up with PCIe-to-PCI bridges or broken
> hardware such as the ones we've been discovering, where multifunction
> devices do their DMA from the wrong function).
> 
> Either way, you *do* have domains. You just might not have thought about
> it before.

Right, sorry, what I mean is that there's no concept of runtime
assignment of devices to domains.  The concept used in the
documentation is a "Partitionable Endpoint" (PE) - which would
correspond to the isolation groups I'm proposing.  These are generally
assigned by firmware based on various hardware dependent isolation
constraints.

When we're running paravirtualized under pHyp, it's impossible to
merge multiple PEs into one domain per se.  We could fake it rather
nastily by replicating all map/unmaps across mutiple PEs.  When
running bare metal, we could do so a bit more nicely by assigning
multiple PEs the same TCE pointer, but we have no mechanism to do so
at present.

Older hardware usually does have just one PE per host bridge, but it
also often has only one slot per host bridge, so in practice is often
both one domain per host bridge _and_ one device per host bridge.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] Device isolation infrastructure v2
  2011-12-19 22:31         ` David Gibson
@ 2011-12-19 22:56           ` David Woodhouse
  2011-12-20  0:25             ` David Gibson
  0 siblings, 1 reply; 23+ messages in thread
From: David Woodhouse @ 2011-12-19 22:56 UTC (permalink / raw)
  To: David Gibson
  Cc: Joerg Roedel, Alex Williamson, aik, benh, chrisw, agraf,
	scottwood, B08248, rusty, iommu, qemu-devel, linux-kernel, joro

[-- Attachment #1: Type: text/plain, Size: 1073 bytes --]

On Tue, 2011-12-20 at 09:31 +1100, David Gibson wrote:
> When we're running paravirtualized under pHyp, it's impossible to
> merge multiple PEs into one domain per se.  We could fake it rather
> nastily by replicating all map/unmaps across mutiple PEs.  When
> running bare metal, we could do so a bit more nicely by assigning
> multiple PEs the same TCE pointer, but we have no mechanism to do so
> at present. 

VT-d does share the page tables, as you could on bare metal. But it's an
implementation detail — there's nothing *fundamentally* wrong with
having to do the map/unmap for each PE, is there? It's only at VM setup
time, so it doesn't really matter if it's slow.

Surely that's the only way you're going to present the guest with the
illusion of having no IOMMU; so that DMA to any given guest physical
address "just works".

On the other hand, perhaps you don't want to do that at all. Perhaps
you're better off presenting a virtualised IOMMU to the guest and
*insisting* that it fully uses it in order to do any DMA at all?

-- 
dwmw2

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5818 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] Device isolation infrastructure v2
  2011-12-19 22:56           ` David Woodhouse
@ 2011-12-20  0:25             ` David Gibson
  0 siblings, 0 replies; 23+ messages in thread
From: David Gibson @ 2011-12-20  0:25 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Joerg Roedel, Alex Williamson, aik, benh, chrisw, agraf,
	scottwood, B08248, rusty, iommu, qemu-devel, linux-kernel, joro

On Mon, Dec 19, 2011 at 10:56:40PM +0000, David Woodhouse wrote:
> On Tue, 2011-12-20 at 09:31 +1100, David Gibson wrote:
> > When we're running paravirtualized under pHyp, it's impossible to
> > merge multiple PEs into one domain per se.  We could fake it rather
> > nastily by replicating all map/unmaps across mutiple PEs.  When
> > running bare metal, we could do so a bit more nicely by assigning
> > multiple PEs the same TCE pointer, but we have no mechanism to do so
> > at present. 
> 
> VT-d does share the page tables, as you could on bare metal. But it's an
> implementation detail — there's nothing *fundamentally* wrong with
> having to do the map/unmap for each PE, is there? It's only at VM setup
> time, so it doesn't really matter if it's slow.
> 
> Surely that's the only way you're going to present the guest with the
> illusion of having no IOMMU; so that DMA to any given guest physical
> address "just works".
> 
> On the other hand, perhaps you don't want to do that at all. Perhaps
> you're better off presenting a virtualised IOMMU to the guest and
> *insisting* that it fully uses it in order to do any DMA at all?

Not only do we want to, we more or less *have* to.  Existing kernels,
which are used to being paravirt under phyp expect and need a paravirt
iommu.  DMA without iommu setup just doesn't happen.  And the
map/unmap hypercalls are frequently a hot path, so slow does matter.

The other problem is that each domain's IOVA window is often fairly
small, a limitation that would get even worse if we try to put too
many devices in there.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [RFC] Device isolation infrastructure v2
  2011-12-19 15:41       ` Joerg Roedel
@ 2011-12-21  3:32         ` David Gibson
  2011-12-21  4:30           ` Alex Williamson
  2011-12-21 16:46           ` Joerg Roedel
  0 siblings, 2 replies; 23+ messages in thread
From: David Gibson @ 2011-12-21  3:32 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Alex Williamson, aik, benh, dwmw2, chrisw, agraf, scottwood,
	B08248, rusty, iommu, qemu-devel, linux-kernel, joro

On Mon, Dec 19, 2011 at 04:41:56PM +0100, Joerg Roedel wrote:
> On Mon, Dec 19, 2011 at 11:11:25AM +1100, David Gibson wrote:
> > Well.. that's not where it is in Alex's code either.  The iommu layer
> > (to the extent that there is such a "layer") supplies the group info,
> > but the group management is in vfio, not the iommu layer.  With mine
> > it is in the driver core because the struct device seemed the logical
> > place for the group id.
> 
> Okay, seems we have different ideas of what the 'grouping code' is. I
> talked about the group enumeration code only. But group handling code is
> certainly important to some degree too. But before we argue about the
> right place of the code we should agree on the semantics such code
> should provide.
> 
> For me it is fine when the code is in VFIO for now, since VFIO is the
> only user at the moment. When more users pop up we can easily move it
> out to somewhere else. But the semantics influence the interface to
> user-space too, so it is more important now. It splits up into a number
> of sub problems:
> 
> 	1) How userspace detects the device<->group relationship?
> 	2) Do we want group-binding/unbinding to device drivers?
> 	3) Group attach/detach to iommu-domains?
> 	4) What to do with hot-added devices?
> 
> For 1) I think the current solution with the iommu_group file is fine.
> It is somewhat expensive for user-space to figure out the per-group
> device-sets, but that is a one-time effort so it doesn't really matter.
> Probably we can rename 'iommu_group' to 'isolation_group' or
> something.

Hrm.  Alex's group code also provides no in-kernel way to enumerate a
group, short of walking every device in the system.  And it provides
no way to attach information to a group.  It just seems foolish to me
to have this concept without some kind of in-kernel handle on it, and
if you're managing the in-kernel representation you might as well
expose it to userspace there as well.

> Regarding 2), I think providing user-space a way to unbind groups of
> devices from their drivers is a horrible idea.

Well, I'm not wed to unbinding all the drivers at once.  But what I
*do* think is essential is that we can atomically switch off automatic
driver matching for the whole group.  Without that nothing really
stops a driver reattaching to the things you've unbound, so you end up
bailing a leakey boat.

> It makes it too easy for
> the user to shoot himself in the foot. For example when the user wants
> to assign a network card to a guest, but that card is in the same group
> as the GPU and the screen wents blank when the guest is started.
> Requiring devices to be unbound one-by-one is better because this way
> the user always needs to know what he is doing.

Ok, that's not the usage model I had in mind.  What I'm thinking here
is that the admin removes groups that will be used in guests from the
host kernel (probably via boot-time scripts).  The guests will pick
them up later, so that when a guest quits then restarts, we don't have
devices appearing transiently in the host.

> For the remaining two questions I think the concept of a default-domain
> is helpful.  The default-domain is a per-group domain which is created
> by the iommu-driver at initialization time. It is the domain each device
> is assigned to when it is not assigned to any other domain (which means
> that each device/group is always attached to a domain). The default
> domain will be used by the DMA-API layer. This implicitly means, that a
> device which is not in the default-domain can't be used with the
> dma-api. The dma_supported() function will return false for those
> devices.

But.. by definition every device in the group must belong to the same
domain.  So how is this "default domain" in any way different from
"current domain".

In addition making dma_supported() doesn't seem like a strong enough
constraint.  With this a kernel driver which does not use DMA, or
which is initializing and hasn't yet hit a dma_supported() check could
be accessing a device which is in the same group as something a guest
is simultaneously accessing.  Since there's no DMA (on the kernel
side) we can't get DMA conflicts but there are other forms of
isolation that the group could be enforcing which would make that
unsafe. e.g. irqs from the two devices can't be reliably separated,
debug registers on one device let config space be altered to move it
on top of the other, one can cause a bus error which will mess up the
other.

> So what does this mean for point 3? I think we can implement attaching
> and detaching groups in the iommu-api. This interface is not exposed to
> userspace and can help VFIO and possible future users. Semantic is, that
> domain_attach_group() only works when all devices in the group are in
> their default domain and domain_detach_group() puts them back into the
> default domain.

The domain_{attach,detach} functions absolutely should be group based
not device based.  That's what it means to be a group.

> Question 4) is also solved with the default-domain concept. A hot-added
> device is put in the domain of its group automatically.

Well, of course it is.  A group can only have one domain.  That's what
being a group means.

> If the group is
> owned by VFIO and another driver attaches to the device dma_supported
> will return false and initialization will fail.

By the time dma_supported is checked the driver could have already
touched the device.  That's way too late.

> > Right, so, the other problem is that a well boundaried "iommu-driver'
> > is something that only exists on x86 at present, and the "iommu api"
> > is riddled with x86-centric thinking.  Or more accurately, design
> > based on how the current intel and amd iommus work.  On systems like
> > POWER, use of the iommu is not optional - it's built into the PCI host
> > bridge and must be initialized when the bridge is probed, much earlier
> > than iommu driver initialization on x86.  They have no inbuilt concept
> > of domains (though we could fake in software in some circumstances).
> 
> Well, the iommu-api was designed for amd-vi and vt-d. But its concepts
> turn out to be more general and by no way x86-centric anymore.

It's improving, but there are still plenty of x86isms there.

> We
> support a couple of ARM platforms too for example. More to come. With
> small extensions to the API we will also support GART-like IOMMUs in the
> future.
> For your hardware the domain-concept will work too. In terms of the
> iommu-api a domain is nothing more than an address space. As far as I
> understand there is a 1-1 mapping between a hardware iommu and a domain
> in your case. The easiest solution then is to share the datastructures
> which describe the address space to the hardware between all iommus in a
> particular domain.
> 
> 
> Regards,
> 
> 	Joerg
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [RFC] Device isolation infrastructure v2
  2011-12-21  3:32         ` [Qemu-devel] " David Gibson
@ 2011-12-21  4:30           ` Alex Williamson
  2011-12-21  6:12             ` Aaron Fabbri
  2012-01-25  3:13             ` David Gibson
  2011-12-21 16:46           ` Joerg Roedel
  1 sibling, 2 replies; 23+ messages in thread
From: Alex Williamson @ 2011-12-21  4:30 UTC (permalink / raw)
  To: David Gibson
  Cc: Joerg Roedel, aik, benh, dwmw2, chrisw, agraf, scottwood, B08248,
	rusty, iommu, qemu-devel, linux-kernel, joro

On Wed, 2011-12-21 at 14:32 +1100, David Gibson wrote:
> On Mon, Dec 19, 2011 at 04:41:56PM +0100, Joerg Roedel wrote:
> > On Mon, Dec 19, 2011 at 11:11:25AM +1100, David Gibson wrote:
> > > Well.. that's not where it is in Alex's code either.  The iommu layer
> > > (to the extent that there is such a "layer") supplies the group info,
> > > but the group management is in vfio, not the iommu layer.  With mine
> > > it is in the driver core because the struct device seemed the logical
> > > place for the group id.
> > 
> > Okay, seems we have different ideas of what the 'grouping code' is. I
> > talked about the group enumeration code only. But group handling code is
> > certainly important to some degree too. But before we argue about the
> > right place of the code we should agree on the semantics such code
> > should provide.
> > 
> > For me it is fine when the code is in VFIO for now, since VFIO is the
> > only user at the moment. When more users pop up we can easily move it
> > out to somewhere else. But the semantics influence the interface to
> > user-space too, so it is more important now. It splits up into a number
> > of sub problems:
> > 
> > 	1) How userspace detects the device<->group relationship?
> > 	2) Do we want group-binding/unbinding to device drivers?
> > 	3) Group attach/detach to iommu-domains?
> > 	4) What to do with hot-added devices?
> > 
> > For 1) I think the current solution with the iommu_group file is fine.
> > It is somewhat expensive for user-space to figure out the per-group
> > device-sets, but that is a one-time effort so it doesn't really matter.
> > Probably we can rename 'iommu_group' to 'isolation_group' or
> > something.
> 
> Hrm.  Alex's group code also provides no in-kernel way to enumerate a
> group, short of walking every device in the system.  And it provides
> no way to attach information to a group.  It just seems foolish to me
> to have this concept without some kind of in-kernel handle on it, and

Who else needs to enumerate groups right now?  Who else needs to attach
data to a group.  We seem to be caught in this loop of arguing that we
need driver core based group management, but we don't have any plans to
make use of it, so it just bloats the kernel for most of the users that
don't care about it.

> if you're managing the in-kernel representation you might as well
> expose it to userspace there as well.

Unfortunately this is just the start a peeling back layers of the onion.
We manage groups in the driver core, so the driver core should expose
them to userspace.  The driver core exposes them to userspace, so now it
needs to manage permissions for userspace.  Then we add permissions and
now we need to provide group access, then we need a channel to an actual
userspace device driver, zing! we add a whole API there, then we need
group driver binding, then we need group device driver binding, blam!
another API, then we need...  I don't see a clear end marker that
doesn't continue to bloat the core and add functionality that nobody
else needs and we don't even have plans of integrating more pervasively.
This appears to end with 80 to 90% of the vfio core code moving into the
driver core.

> > Regarding 2), I think providing user-space a way to unbind groups of
> > devices from their drivers is a horrible idea.
> 
> Well, I'm not wed to unbinding all the drivers at once.  But what I
> *do* think is essential is that we can atomically switch off automatic
> driver matching for the whole group.  Without that nothing really
> stops a driver reattaching to the things you've unbound, so you end up
> bailing a leakey boat.

Huh?  There is no issue with removing a device from one driver and
attaching it to another.  This happens all the time.  If you're talking
about hotplug, all we have to do is sit on the bus notifier chain and we
get called when devices are added, before the driver has a chance to
attach.  We can then force a vfio driver to attach when needed.  Hell,
we can just set dev->driver to something to prevent standard driver
probing.

> > It makes it too easy for
> > the user to shoot himself in the foot. For example when the user wants
> > to assign a network card to a guest, but that card is in the same group
> > as the GPU and the screen wents blank when the guest is started.
> > Requiring devices to be unbound one-by-one is better because this way
> > the user always needs to know what he is doing.
> 
> Ok, that's not the usage model I had in mind.  What I'm thinking here
> is that the admin removes groups that will be used in guests from the
> host kernel (probably via boot-time scripts).  The guests will pick
> them up later, so that when a guest quits then restarts, we don't have
> devices appearing transiently in the host.

I don't think that model is dynamic enough for our existing use cases.
A user shouldn't need to decide at boot time which devices are going to
be used for what purpose.  It's entirely valid for a user to start up a
VM, decide they want more network performance and reassign a NIC from
host use to guest use.  When they shutdown the VM or hot-unplug it, they
should be able to easily put it back to work on the host.

> > For the remaining two questions I think the concept of a default-domain
> > is helpful.  The default-domain is a per-group domain which is created
> > by the iommu-driver at initialization time. It is the domain each device
> > is assigned to when it is not assigned to any other domain (which means
> > that each device/group is always attached to a domain). The default
> > domain will be used by the DMA-API layer. This implicitly means, that a
> > device which is not in the default-domain can't be used with the
> > dma-api. The dma_supported() function will return false for those
> > devices.
> 
> But.. by definition every device in the group must belong to the same
> domain.  So how is this "default domain" in any way different from
> "current domain".
> 
> In addition making dma_supported() doesn't seem like a strong enough
> constraint.  With this a kernel driver which does not use DMA, or
> which is initializing and hasn't yet hit a dma_supported() check could
> be accessing a device which is in the same group as something a guest
> is simultaneously accessing.  Since there's no DMA (on the kernel
> side) we can't get DMA conflicts but there are other forms of
> isolation that the group could be enforcing which would make that
> unsafe. e.g. irqs from the two devices can't be reliably separated,
> debug registers on one device let config space be altered to move it
> on top of the other, one can cause a bus error which will mess up the
> other.

I agree, tapping into dma_supported() isn't a strong enough barrier.

> > So what does this mean for point 3? I think we can implement attaching
> > and detaching groups in the iommu-api. This interface is not exposed to
> > userspace and can help VFIO and possible future users. Semantic is, that
> > domain_attach_group() only works when all devices in the group are in
> > their default domain and domain_detach_group() puts them back into the
> > default domain.
> 
> The domain_{attach,detach} functions absolutely should be group based
> not device based.  That's what it means to be a group.

Yet we have no plans to make that change...

> > Question 4) is also solved with the default-domain concept. A hot-added
> > device is put in the domain of its group automatically.
> 
> Well, of course it is.  A group can only have one domain.  That's what
> being a group means.
> 
> > If the group is
> > owned by VFIO and another driver attaches to the device dma_supported
> > will return false and initialization will fail.
> 
> By the time dma_supported is checked the driver could have already
> touched the device.  That's way too late.

Very likely that the driver will have already interfered with config
space.

> > > Right, so, the other problem is that a well boundaried "iommu-driver'
> > > is something that only exists on x86 at present, and the "iommu api"
> > > is riddled with x86-centric thinking.  Or more accurately, design
> > > based on how the current intel and amd iommus work.  On systems like
> > > POWER, use of the iommu is not optional - it's built into the PCI host
> > > bridge and must be initialized when the bridge is probed, much earlier
> > > than iommu driver initialization on x86.  They have no inbuilt concept
> > > of domains (though we could fake in software in some circumstances).
> > 
> > Well, the iommu-api was designed for amd-vi and vt-d. But its concepts
> > turn out to be more general and by no way x86-centric anymore.
> 
> It's improving, but there are still plenty of x86isms there.

Having worked on ia64 for a while, it's interesting to see this x86
bashing from the other side.  Everyone is more than willing to make
architecture neutral interfaces (jeez, look at the extent of the vfio
reworks), but it's not fair to throw away interfaces as x86-centric if
you're not pushing your requirements and making use of the code.

It seems like we'd be better served today to start with the vfio code we
have and let that be the catalyst to drive an iommu api that better
serves non-x86.  I don't see how this group management tangent is really
getting us anywhere.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [RFC] Device isolation infrastructure v2
  2011-12-21  4:30           ` Alex Williamson
@ 2011-12-21  6:12             ` Aaron Fabbri
  2012-01-25  3:13             ` David Gibson
  1 sibling, 0 replies; 23+ messages in thread
From: Aaron Fabbri @ 2011-12-21  6:12 UTC (permalink / raw)
  To: Alex Williamson, David Gibson
  Cc: chrisw, aik, rusty, agraf, qemu-devel, B08248, iommu,
	Benjamin Herrenschmidt, scottwood, David Woodhouse, linux-kernel




On 12/20/11 8:30 PM, "Alex Williamson" <alex.williamson@redhat.com> wrote:

> On Wed, 2011-12-21 at 14:32 +1100, David Gibson wrote:
>> On Mon, Dec 19, 2011 at 04:41:56PM +0100, Joerg Roedel wrote:
>>> On Mon, Dec 19, 2011 at 11:11:25AM +1100, David Gibson wrote:
<snip>
>>> 
>>> Well, the iommu-api was designed for amd-vi and vt-d. But its concepts
>>> turn out to be more general and by no way x86-centric anymore.
>> 
>> It's improving, but there are still plenty of x86isms there.
> 
> Having worked on ia64 for a while, it's interesting to see this x86
> bashing from the other side.  Everyone is more than willing to make
> architecture neutral interfaces (jeez, look at the extent of the vfio
> reworks), but it's not fair to throw away interfaces as x86-centric if
> you're not pushing your requirements and making use of the code.
> 
> It seems like we'd be better served today to start with the vfio code we
> have and let that be the catalyst to drive an iommu api that better
> serves non-x86.  I don't see how this group management tangent is really
> getting us anywhere.  Thanks,

I'd agree that incremental approach here is key.  VFIO has already seen a
ton of rework to accommodate all architectures.  Let's not bite off a bunch
of these other subsystem rewrites in the same chunk as our VFIO effort.

-Aaron


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [RFC] Device isolation infrastructure v2
  2011-12-21  3:32         ` [Qemu-devel] " David Gibson
  2011-12-21  4:30           ` Alex Williamson
@ 2011-12-21 16:46           ` Joerg Roedel
  1 sibling, 0 replies; 23+ messages in thread
From: Joerg Roedel @ 2011-12-21 16:46 UTC (permalink / raw)
  To: Alex Williamson, aik, benh, dwmw2, chrisw, agraf, scottwood,
	B08248, rusty, iommu, qemu-devel, linux-kernel, joro

On Wed, Dec 21, 2011 at 02:32:35PM +1100, David Gibson wrote:
> On Mon, Dec 19, 2011 at 04:41:56PM +0100, Joerg Roedel wrote:
> > For 1) I think the current solution with the iommu_group file is fine.
> > It is somewhat expensive for user-space to figure out the per-group
> > device-sets, but that is a one-time effort so it doesn't really matter.
> > Probably we can rename 'iommu_group' to 'isolation_group' or
> > something.
> 
> Hrm.  Alex's group code also provides no in-kernel way to enumerate a
> group, short of walking every device in the system.  And it provides
> no way to attach information to a group.  It just seems foolish to me
> to have this concept without some kind of in-kernel handle on it, and
> if you're managing the in-kernel representation you might as well
> expose it to userspace there as well.

I agree that we should have an in-kernel handle for groups in the
future. This would be generic code the iommu-drivers can use to manage
their groups and where the pci-quirks can chime in. But we don't need
that immediatly, lets add this incrementally.
>From the implementation side I would prefer to introduce a dev->iommu
pointer (a generalization of the dev->archdata->iommu pointer we have on
x86, ia64 and arm already) and link this to the handle of the group.
This makes it also easy to walk all devices in a group within the
kernel.
I still disagree that we need to expose this to user-space in any other
way then what we currently have. It is just not worth the effort when it
is used that rarely. It is perfectly fine if user-space has to scan the
sysfs device tree to build its own group data structures.

> > Regarding 2), I think providing user-space a way to unbind groups of
> > devices from their drivers is a horrible idea.
> 
> Well, I'm not wed to unbinding all the drivers at once.  But what I
> *do* think is essential is that we can atomically switch off automatic
> driver matching for the whole group.  Without that nothing really
> stops a driver reattaching to the things you've unbound, so you end up
> bailing a leakey boat.

Okay, makes sense for hotadd to have this. What we need from the
driver-core here is a method to disable driver probing for a device. The
iommu-layer can then call this method on the DEVICE_ADD notifier if
required. This also can be done incrementally to the current VFIO
implementation.

> Ok, that's not the usage model I had in mind.  What I'm thinking here
> is that the admin removes groups that will be used in guests from the
> host kernel (probably via boot-time scripts).  The guests will pick
> them up later, so that when a guest quits then restarts, we don't have
> devices appearing transiently in the host.

This 'remove-groups-from-the-host' thing can easily be done (also at
boot time) by just binding the devices in question to the VFIO driver.
And the devices can stay with VFIO for the lifetime of the system (or
until the host want to make use of it). I don't think we need any
additional logic for that.

> > For the remaining two questions I think the concept of a default-domain
> > is helpful.  The default-domain is a per-group domain which is created
> > by the iommu-driver at initialization time. It is the domain each device
> > is assigned to when it is not assigned to any other domain (which means
> > that each device/group is always attached to a domain). The default
> > domain will be used by the DMA-API layer. This implicitly means, that a
> > device which is not in the default-domain can't be used with the
> > dma-api. The dma_supported() function will return false for those
> > devices.
> 
> But.. by definition every device in the group must belong to the same
> domain.  So how is this "default domain" in any way different from
> "current domain".

The default domain is created by the iommu-drivers for all groups it
finds on the system and the devices are automatically assigned to it at
iommu-driver initialization time. This domain will be used for mapping
dma-api calls.
The "current domain" on the other side can also be a domain created by
VFIO (or KVM as it is today) for mapping the device dma space into a
guest.

> The domain_{attach,detach} functions absolutely should be group based
> not device based.  That's what it means to be a group.

I disagree with that. This would force all iommu-drivers that have no
group-concept (basically all current ARM iommu implementations) to work
on groups too. That makes the iommu-api much harder to use.

We could do instead:

	1) Force users of the iommu-api to bind each device in the group
	   individually.
	2) Attach all devices in the group automatically to the same
	   domain if a single device is attached to it.

I have no strong opinion which way we go, but have a slight favour for
way 2. To make it harder to mis-use we can force that way 2 only works
if the group is in its default-domain and there are no pending
dma-allocations.

> > Question 4) is also solved with the default-domain concept. A hot-added
> > device is put in the domain of its group automatically.
> 
> Well, of course it is.  A group can only have one domain.  That's what
> being a group means.
> 
> > If the group is
> > owned by VFIO and another driver attaches to the device dma_supported
> > will return false and initialization will fail.
> 
> By the time dma_supported is checked the driver could have already
> touched the device.  That's way too late.

Okay, I agree with that, dma_supported is probably not sufficient. But
we can use the method to disable driver-probing instead.

> > Well, the iommu-api was designed for amd-vi and vt-d. But its concepts
> > turn out to be more general and by no way x86-centric anymore.
> 
> It's improving, but there are still plenty of x86isms there.

Hmm, I don't buy that. What x86isms do you mean? And why there are
'plenty'? The iommu-api works on two central concepts: domains and
devices. A domain is an abstraction for an address-space, which, by
definition, every IOMMU hardware provides. And devices are also not a
very x86-centric concept ;)
The concept that a domain can spread multiple devices (or even groups)
can be seen as x86-originated. But so far no non-x86 IOMMU
implementation we have in drivers/iommu has a big problem with that.


	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [RFC] Device isolation infrastructure v2
  2011-12-21  4:30           ` Alex Williamson
  2011-12-21  6:12             ` Aaron Fabbri
@ 2012-01-25  3:13             ` David Gibson
  2012-01-25 23:44               ` Alex Williamson
  1 sibling, 1 reply; 23+ messages in thread
From: David Gibson @ 2012-01-25  3:13 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Joerg Roedel, aik, benh, dwmw2, chrisw, agraf, scottwood, B08248,
	rusty, iommu, qemu-devel, linux-kernel, joro

On Tue, Dec 20, 2011 at 09:30:37PM -0700, Alex Williamson wrote:
> On Wed, 2011-12-21 at 14:32 +1100, David Gibson wrote:
> > On Mon, Dec 19, 2011 at 04:41:56PM +0100, Joerg Roedel wrote:
> > > On Mon, Dec 19, 2011 at 11:11:25AM +1100, David Gibson wrote:
> > > > Well.. that's not where it is in Alex's code either.  The iommu layer
> > > > (to the extent that there is such a "layer") supplies the group info,
> > > > but the group management is in vfio, not the iommu layer.  With mine
> > > > it is in the driver core because the struct device seemed the logical
> > > > place for the group id.
> > > 
> > > Okay, seems we have different ideas of what the 'grouping code' is. I
> > > talked about the group enumeration code only. But group handling code is
> > > certainly important to some degree too. But before we argue about the
> > > right place of the code we should agree on the semantics such code
> > > should provide.
> > > 
> > > For me it is fine when the code is in VFIO for now, since VFIO is the
> > > only user at the moment. When more users pop up we can easily move it
> > > out to somewhere else. But the semantics influence the interface to
> > > user-space too, so it is more important now. It splits up into a number
> > > of sub problems:
> > > 
> > > 	1) How userspace detects the device<->group relationship?
> > > 	2) Do we want group-binding/unbinding to device drivers?
> > > 	3) Group attach/detach to iommu-domains?
> > > 	4) What to do with hot-added devices?
> > > 
> > > For 1) I think the current solution with the iommu_group file is fine.
> > > It is somewhat expensive for user-space to figure out the per-group
> > > device-sets, but that is a one-time effort so it doesn't really matter.
> > > Probably we can rename 'iommu_group' to 'isolation_group' or
> > > something.
> > 
> > Hrm.  Alex's group code also provides no in-kernel way to enumerate a
> > group, short of walking every device in the system.  And it provides
> > no way to attach information to a group.  It just seems foolish to me
> > to have this concept without some kind of in-kernel handle on it, and
> 
> Who else needs to enumerate groups right now?  Who else needs to attach
> data to a group.  We seem to be caught in this loop of arguing that we
> need driver core based group management, but we don't have any plans to
> make use of it, so it just bloats the kernel for most of the users that
> don't care about it.

So, Ben and I discussed this with David Woodhouse during
linux.conf.au.  He does want to update the core iommu_ops and dma_ops
handling to be group aware, and is willing to do the work for that.
So I will be doing another spin of my isolation code as a basis for
that work of his.  So there's our other user.

> > if you're managing the in-kernel representation you might as well
> > expose it to userspace there as well.
> 
> Unfortunately this is just the start a peeling back layers of the onion.
> We manage groups in the driver core, so the driver core should expose
> them to userspace.  The driver core exposes them to userspace, so now it
> needs to manage permissions for userspace.

That doesn't necessarily follow.  The current draft has the core group
code export character devices on which permissions are managed, but
I'm also considering options where it only exports sysfs and something
else does the character device and permissions.

>  Then we add permissions and
> now we need to provide group access, then we need a channel to an actual
> userspace device driver, zing! we add a whole API there,

And the other option I was looking add had the core providing the char
device but having it's fops get passed straight through to the binder.

> then we need
> group driver binding, then we need group device driver binding, blam!
> another API, then we need...  I don't see a clear end marker that
> doesn't continue to bloat the core and add functionality that nobody
> else needs and we don't even have plans of integrating more pervasively.
> This appears to end with 80 to 90% of the vfio core code moving into the
> driver core.

I don't agree.  Working out the right boundary isn't totally obvious,
certainly, but that doesn't mean a reasonable boundary can't be found.

> 
> > > Regarding 2), I think providing user-space a way to unbind groups of
> > > devices from their drivers is a horrible idea.
> > 
> > Well, I'm not wed to unbinding all the drivers at once.  But what I
> > *do* think is essential is that we can atomically switch off automatic
> > driver matching for the whole group.  Without that nothing really
> > stops a driver reattaching to the things you've unbound, so you end up
> > bailing a leakey boat.
> 
> Huh?  There is no issue with removing a device from one driver and
> attaching it to another.  This happens all the time.  If you're talking
> about hotplug, all we have to do is sit on the bus notifier chain and we
> get called when devices are added, before the driver has a chance to
> attach.  We can then force a vfio driver to attach when needed.

But what is the transition point at which you know force attaching a
vfio driver is the right thing?  My draft does that atomically with
unbinding the existing drivers.  But you have a fair point about that
causing devices to disappear surprisingly, so I think I'll change that
so that it instead becomes 3 step: first prevent new drivers
auto-binding to things in the group, then explicitly unbind existing
drivers, then grab the group for userspace access.

>  Hell,
> we can just set dev->driver to something to prevent standard driver
> probing.

> > > It makes it too easy for
> > > the user to shoot himself in the foot. For example when the user wants
> > > to assign a network card to a guest, but that card is in the same group
> > > as the GPU and the screen wents blank when the guest is started.
> > > Requiring devices to be unbound one-by-one is better because this way
> > > the user always needs to know what he is doing.
> > 
> > Ok, that's not the usage model I had in mind.  What I'm thinking here
> > is that the admin removes groups that will be used in guests from the
> > host kernel (probably via boot-time scripts).  The guests will pick
> > them up later, so that when a guest quits then restarts, we don't have
> > devices appearing transiently in the host.
> 
> I don't think that model is dynamic enough for our existing use cases.
> A user shouldn't need to decide at boot time which devices are going to
> be used for what purpose.  It's entirely valid for a user to start up a
> VM, decide they want more network performance and reassign a NIC from
> host use to guest use.  When they shutdown the VM or hot-unplug it, they
> should be able to easily put it back to work on the host.

Well, sure, you can do that too.  But I bet you the more common usage
model for production will be to have most of the hardware assigned to
a particular guest on a long term basis.

[snip]
> > > So what does this mean for point 3? I think we can implement attaching
> > > and detaching groups in the iommu-api. This interface is not exposed to
> > > userspace and can help VFIO and possible future users. Semantic is, that
> > > domain_attach_group() only works when all devices in the group are in
> > > their default domain and domain_detach_group() puts them back into the
> > > default domain.
> > 
> > The domain_{attach,detach} functions absolutely should be group based
> > not device based.  That's what it means to be a group.
> 
> Yet we have no plans to make that change...

David Woodhouse does now.

> > > Question 4) is also solved with the default-domain concept. A hot-added
> > > device is put in the domain of its group automatically.
> > 
> > Well, of course it is.  A group can only have one domain.  That's what
> > being a group means.
> > 
> > > If the group is
> > > owned by VFIO and another driver attaches to the device dma_supported
> > > will return false and initialization will fail.
> > 
> > By the time dma_supported is checked the driver could have already
> > touched the device.  That's way too late.
> 
> Very likely that the driver will have already interfered with config
> space.

Exactly.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [RFC] Device isolation infrastructure v2
  2012-01-25  3:13             ` David Gibson
@ 2012-01-25 23:44               ` Alex Williamson
  2012-01-30 23:22                 ` David Gibson
  0 siblings, 1 reply; 23+ messages in thread
From: Alex Williamson @ 2012-01-25 23:44 UTC (permalink / raw)
  To: David Gibson
  Cc: Joerg Roedel, aik, benh, dwmw2, chrisw, agraf, scottwood, B08248,
	rusty, iommu, qemu-devel, linux-kernel, joro

On Wed, 2012-01-25 at 14:13 +1100, David Gibson wrote:
> On Tue, Dec 20, 2011 at 09:30:37PM -0700, Alex Williamson wrote:
> > On Wed, 2011-12-21 at 14:32 +1100, David Gibson wrote:
> > > On Mon, Dec 19, 2011 at 04:41:56PM +0100, Joerg Roedel wrote:
> > > > On Mon, Dec 19, 2011 at 11:11:25AM +1100, David Gibson wrote:
> > > > > Well.. that's not where it is in Alex's code either.  The iommu layer
> > > > > (to the extent that there is such a "layer") supplies the group info,
> > > > > but the group management is in vfio, not the iommu layer.  With mine
> > > > > it is in the driver core because the struct device seemed the logical
> > > > > place for the group id.
> > > > 
> > > > Okay, seems we have different ideas of what the 'grouping code' is. I
> > > > talked about the group enumeration code only. But group handling code is
> > > > certainly important to some degree too. But before we argue about the
> > > > right place of the code we should agree on the semantics such code
> > > > should provide.
> > > > 
> > > > For me it is fine when the code is in VFIO for now, since VFIO is the
> > > > only user at the moment. When more users pop up we can easily move it
> > > > out to somewhere else. But the semantics influence the interface to
> > > > user-space too, so it is more important now. It splits up into a number
> > > > of sub problems:
> > > > 
> > > > 	1) How userspace detects the device<->group relationship?
> > > > 	2) Do we want group-binding/unbinding to device drivers?
> > > > 	3) Group attach/detach to iommu-domains?
> > > > 	4) What to do with hot-added devices?
> > > > 
> > > > For 1) I think the current solution with the iommu_group file is fine.
> > > > It is somewhat expensive for user-space to figure out the per-group
> > > > device-sets, but that is a one-time effort so it doesn't really matter.
> > > > Probably we can rename 'iommu_group' to 'isolation_group' or
> > > > something.
> > > 
> > > Hrm.  Alex's group code also provides no in-kernel way to enumerate a
> > > group, short of walking every device in the system.  And it provides
> > > no way to attach information to a group.  It just seems foolish to me
> > > to have this concept without some kind of in-kernel handle on it, and
> > 
> > Who else needs to enumerate groups right now?  Who else needs to attach
> > data to a group.  We seem to be caught in this loop of arguing that we
> > need driver core based group management, but we don't have any plans to
> > make use of it, so it just bloats the kernel for most of the users that
> > don't care about it.
> 
> So, Ben and I discussed this with David Woodhouse during
> linux.conf.au.  He does want to update the core iommu_ops and dma_ops
> handling to be group aware, and is willing to do the work for that.
> So I will be doing another spin of my isolation code as a basis for
> that work of his.  So there's our other user.

Hmm...

> > > if you're managing the in-kernel representation you might as well
> > > expose it to userspace there as well.
> > 
> > Unfortunately this is just the start a peeling back layers of the onion.
> > We manage groups in the driver core, so the driver core should expose
> > them to userspace.  The driver core exposes them to userspace, so now it
> > needs to manage permissions for userspace.
> 
> That doesn't necessarily follow.  The current draft has the core group
> code export character devices on which permissions are managed, but
> I'm also considering options where it only exports sysfs and something
> else does the character device and permissions.

Let's start to define it then.  A big problem I have with the isolation
infrastructure you're proposing is that it doesn't have well defined
bounds or interfaces.  It pulls devices out of the normal driver model,
and even goes so far as to start exposing chardevs for groups to
userspace.  The former feels like completely the wrong approach and the
latter oversteps the bounds of a core interface for a very niche usage,
IMO.

Your argument for making groups more pervasive is that the iommu drivers
need to know about them already and we should make use of groups in the
dma_ops and iommu_ops interfaces.  Fine, that seems reasonable and I
assume is what Woodhouse has agreed to work on.  I don't however see
that as an argument for creating a group driver class or isolation API.
In fact, we could probably layer a fairly similar version of vfio on top
of "group enlightened" driver core with minimal changes.

> >  Then we add permissions and
> > now we need to provide group access, then we need a channel to an actual
> > userspace device driver, zing! we add a whole API there,
> 
> And the other option I was looking add had the core providing the char
> device but having it's fops get passed straight through to the binder.

I'm not a fan of that idea.  Core driver code should not be creating
chardevs and I doubt anyone is onboard with the idea of a chardev with
an arbitrary API bound behind it.

> > then we need
> > group driver binding, then we need group device driver binding, blam!
> > another API, then we need...  I don't see a clear end marker that
> > doesn't continue to bloat the core and add functionality that nobody
> > else needs and we don't even have plans of integrating more pervasively.
> > This appears to end with 80 to 90% of the vfio core code moving into the
> > driver core.
> 
> I don't agree.  Working out the right boundary isn't totally obvious,
> certainly, but that doesn't mean a reasonable boundary can't be found.

So let's define it.  Why do you think the driver core should expose
anything more than a sysfs representation of groups to userspace?

> > 
> > > > Regarding 2), I think providing user-space a way to unbind groups of
> > > > devices from their drivers is a horrible idea.
> > > 
> > > Well, I'm not wed to unbinding all the drivers at once.  But what I
> > > *do* think is essential is that we can atomically switch off automatic
> > > driver matching for the whole group.  Without that nothing really
> > > stops a driver reattaching to the things you've unbound, so you end up
> > > bailing a leakey boat.
> > 
> > Huh?  There is no issue with removing a device from one driver and
> > attaching it to another.  This happens all the time.  If you're talking
> > about hotplug, all we have to do is sit on the bus notifier chain and we
> > get called when devices are added, before the driver has a chance to
> > attach.  We can then force a vfio driver to attach when needed.
> 
> But what is the transition point at which you know force attaching a
> vfio driver is the right thing?

It's quite simple.  The vfio bus driver does a bus_register_notifier()
and watches for BUS_NOTIFY_ADD_DEVICE.  At that point it registers the
device with the vfio core.  The vfio core retrieves the group
information for the device and matches it to a struct vfio_group.  When
the user makes use of the vfio group they get file descriptors for each
device and another for the iommu.  vfio therefore simply needs to check
if there are any open file descriptors for devices or iommu to determine
the newly added device needs to be bound to vfio.  There is actually
code for this, feel free to find it in my trees.

> My draft does that atomically with
> unbinding the existing drivers.  But you have a fair point about that
> causing devices to disappear surprisingly, so I think I'll change that
> so that it instead becomes 3 step: first prevent new drivers
> auto-binding to things in the group, then explicitly unbind existing
> drivers, then grab the group for userspace access.

This is where I get confused, even if there are uses in the driver core
for understanding groups, what business does the driver core have for
exposed groups for userspace access?  I think that crosses the logical
boundary and where we start peeling back layers of the onion for
everything that entails.

> >  Hell,
> > we can just set dev->driver to something to prevent standard driver
> > probing.
> 
> > > > It makes it too easy for
> > > > the user to shoot himself in the foot. For example when the user wants
> > > > to assign a network card to a guest, but that card is in the same group
> > > > as the GPU and the screen wents blank when the guest is started.
> > > > Requiring devices to be unbound one-by-one is better because this way
> > > > the user always needs to know what he is doing.
> > > 
> > > Ok, that's not the usage model I had in mind.  What I'm thinking here
> > > is that the admin removes groups that will be used in guests from the
> > > host kernel (probably via boot-time scripts).  The guests will pick
> > > them up later, so that when a guest quits then restarts, we don't have
> > > devices appearing transiently in the host.
> > 
> > I don't think that model is dynamic enough for our existing use cases.
> > A user shouldn't need to decide at boot time which devices are going to
> > be used for what purpose.  It's entirely valid for a user to start up a
> > VM, decide they want more network performance and reassign a NIC from
> > host use to guest use.  When they shutdown the VM or hot-unplug it, they
> > should be able to easily put it back to work on the host.
> 
> Well, sure, you can do that too.  But I bet you the more common usage
> model for production will be to have most of the hardware assigned to
> a particular guest on a long term basis.

I don't doubt that that's the common usage, but it's a very poor model
to aim for and call "good enough".

> [snip]
> > > > So what does this mean for point 3? I think we can implement attaching
> > > > and detaching groups in the iommu-api. This interface is not exposed to
> > > > userspace and can help VFIO and possible future users. Semantic is, that
> > > > domain_attach_group() only works when all devices in the group are in
> > > > their default domain and domain_detach_group() puts them back into the
> > > > default domain.
> > > 
> > > The domain_{attach,detach} functions absolutely should be group based
> > > not device based.  That's what it means to be a group.
> > 
> > Yet we have no plans to make that change...
> 
> David Woodhouse does now.

Great, let's see it.

Thanks,

Alex



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [RFC] Device isolation infrastructure v2
  2012-01-25 23:44               ` Alex Williamson
@ 2012-01-30 23:22                 ` David Gibson
  0 siblings, 0 replies; 23+ messages in thread
From: David Gibson @ 2012-01-30 23:22 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Joerg Roedel, aik, benh, dwmw2, chrisw, agraf, scottwood, B08248,
	rusty, iommu, qemu-devel, linux-kernel, joro

On Wed, Jan 25, 2012 at 04:44:53PM -0700, Alex Williamson wrote:
> On Wed, 2012-01-25 at 14:13 +1100, David Gibson wrote:
> > On Tue, Dec 20, 2011 at 09:30:37PM -0700, Alex Williamson wrote:
> > > On Wed, 2011-12-21 at 14:32 +1100, David Gibson wrote:
> > > > On Mon, Dec 19, 2011 at 04:41:56PM +0100, Joerg Roedel wrote:
> > > > > On Mon, Dec 19, 2011 at 11:11:25AM +1100, David Gibson wrote:
> > > > > > Well.. that's not where it is in Alex's code either.  The iommu layer
> > > > > > (to the extent that there is such a "layer") supplies the group info,
> > > > > > but the group management is in vfio, not the iommu layer.  With mine
> > > > > > it is in the driver core because the struct device seemed the logical
> > > > > > place for the group id.
> > > > > 
> > > > > Okay, seems we have different ideas of what the 'grouping code' is. I
> > > > > talked about the group enumeration code only. But group handling code is
> > > > > certainly important to some degree too. But before we argue about the
> > > > > right place of the code we should agree on the semantics such code
> > > > > should provide.
> > > > > 
> > > > > For me it is fine when the code is in VFIO for now, since VFIO is the
> > > > > only user at the moment. When more users pop up we can easily move it
> > > > > out to somewhere else. But the semantics influence the interface to
> > > > > user-space too, so it is more important now. It splits up into a number
> > > > > of sub problems:
> > > > > 
> > > > > 	1) How userspace detects the device<->group relationship?
> > > > > 	2) Do we want group-binding/unbinding to device drivers?
> > > > > 	3) Group attach/detach to iommu-domains?
> > > > > 	4) What to do with hot-added devices?
> > > > > 
> > > > > For 1) I think the current solution with the iommu_group file is fine.
> > > > > It is somewhat expensive for user-space to figure out the per-group
> > > > > device-sets, but that is a one-time effort so it doesn't really matter.
> > > > > Probably we can rename 'iommu_group' to 'isolation_group' or
> > > > > something.
> > > > 
> > > > Hrm.  Alex's group code also provides no in-kernel way to enumerate a
> > > > group, short of walking every device in the system.  And it provides
> > > > no way to attach information to a group.  It just seems foolish to me
> > > > to have this concept without some kind of in-kernel handle on it, and
> > > 
> > > Who else needs to enumerate groups right now?  Who else needs to attach
> > > data to a group.  We seem to be caught in this loop of arguing that we
> > > need driver core based group management, but we don't have any plans to
> > > make use of it, so it just bloats the kernel for most of the users that
> > > don't care about it.
> > 
> > So, Ben and I discussed this with David Woodhouse during
> > linux.conf.au.  He does want to update the core iommu_ops and dma_ops
> > handling to be group aware, and is willing to do the work for that.
> > So I will be doing another spin of my isolation code as a basis for
> > that work of his.  So there's our other user.
> 
> Hmm...
> 
> > > > if you're managing the in-kernel representation you might as well
> > > > expose it to userspace there as well.
> > > 
> > > Unfortunately this is just the start a peeling back layers of the onion.
> > > We manage groups in the driver core, so the driver core should expose
> > > them to userspace.  The driver core exposes them to userspace, so now it
> > > needs to manage permissions for userspace.
> > 
> > That doesn't necessarily follow.  The current draft has the core group
> > code export character devices on which permissions are managed, but
> > I'm also considering options where it only exports sysfs and something
> > else does the character device and permissions.
> 
> Let's start to define it then.  A big problem I have with the isolation
> infrastructure you're proposing is that it doesn't have well defined
> bounds or interfaces.  It pulls devices out of the normal driver
> model,

It doesn't pull them out of the driver model.  The struct devices are
all there just like usual, it's just a flag which inhibits driver binding.

> and even goes so far as to start exposing chardevs for groups to
> userspace.

I'm planning to split the patch so the character devices (and maybe
even the concept of binders) aren't in the core group part, which
should be all dwmw2 needs.

>  The former feels like completely the wrong approach and the
> latter oversteps the bounds of a core interface for a very niche usage,
> IMO.
> 
> Your argument for making groups more pervasive is that the iommu drivers
> need to know about them already and we should make use of groups in the
> dma_ops and iommu_ops interfaces.  Fine, that seems reasonable and I
> assume is what Woodhouse has agreed to work on.  I don't however see
> that as an argument for creating a group driver class or isolation
> API.

No, it's not, hence the split.

> In fact, we could probably layer a fairly similar version of vfio on top
> of "group enlightened" driver core with minimal changes.

Yes, you could.

> > >  Then we add permissions and
> > > now we need to provide group access, then we need a channel to an actual
> > > userspace device driver, zing! we add a whole API there,
> > 
> > And the other option I was looking add had the core providing the char
> > device but having it's fops get passed straight through to the binder.
> 
> I'm not a fan of that idea.  Core driver code should not be creating
> chardevs and I doubt anyone is onboard with the idea of a chardev with
> an arbitrary API bound behind it.

Hm, yeah.   I'm still looking to see if I can think of a way to hang
permissions on these things that's less clunky.

> > > then we need
> > > group driver binding, then we need group device driver binding, blam!
> > > another API, then we need...  I don't see a clear end marker that
> > > doesn't continue to bloat the core and add functionality that nobody
> > > else needs and we don't even have plans of integrating more pervasively.
> > > This appears to end with 80 to 90% of the vfio core code moving into the
> > > driver core.
> > 
> > I don't agree.  Working out the right boundary isn't totally obvious,
> > certainly, but that doesn't mean a reasonable boundary can't be found.
> 
> So let's define it.  Why do you think the driver core should expose
> anything more than a sysfs representation of groups to userspace?

Hrm.  It don't.  You're right the previous draft overextended there.

> > > > > Regarding 2), I think providing user-space a way to unbind groups of
> > > > > devices from their drivers is a horrible idea.
> > > > 
> > > > Well, I'm not wed to unbinding all the drivers at once.  But what I
> > > > *do* think is essential is that we can atomically switch off automatic
> > > > driver matching for the whole group.  Without that nothing really
> > > > stops a driver reattaching to the things you've unbound, so you end up
> > > > bailing a leakey boat.
> > > 
> > > Huh?  There is no issue with removing a device from one driver and
> > > attaching it to another.  This happens all the time.  If you're talking
> > > about hotplug, all we have to do is sit on the bus notifier chain and we
> > > get called when devices are added, before the driver has a chance to
> > > attach.  We can then force a vfio driver to attach when needed.
> > 
> > But what is the transition point at which you know force attaching a
> > vfio driver is the right thing?
> 
> It's quite simple.  The vfio bus driver does a bus_register_notifier()
> and watches for BUS_NOTIFY_ADD_DEVICE.  At that point it registers the
> device with the vfio core.  The vfio core retrieves the group
> information for the device and matches it to a struct vfio_group.  When
> the user makes use of the vfio group they get file descriptors for each
> device and another for the iommu.  vfio therefore simply needs to check
> if there are any open file descriptors for devices or iommu to determine
> the newly added device needs to be bound to vfio.  There is actually
> code for this, feel free to find it in my trees.

Ah, I see, you actually look at what's open at hotplug time.  Ok,
ugly, but I guess it works.

Incidentally though, after more thought I've worked out that having
the group ids be per-bus-type as you do is not merely a somewhat
non-obvious constraint, but actually broken.  There's at least one
really clear case where a group must cross multiple bus-types: where a
IOMMU equipped host bridge to a DMA-capable bus can isolate the bus as
a whole but not individual devices beneath it, and there is another
(non IOMMU equipped) bridge below it to a different DMA capable bus.
In this case the group must encompass everything below the host
bridge, including the subordinate bus of a different type.

> > My draft does that atomically with
> > unbinding the existing drivers.  But you have a fair point about that
> > causing devices to disappear surprisingly, so I think I'll change that
> > so that it instead becomes 3 step: first prevent new drivers
> > auto-binding to things in the group, then explicitly unbind existing
> > drivers, then grab the group for userspace access.
> 
> This is where I get confused, even if there are uses in the driver core
> for understanding groups, what business does the driver core have for
> exposed groups for userspace access?  I think that crosses the logical
> boundary and where we start peeling back layers of the onion for
> everything that entails.
> 
> > >  Hell,
> > > we can just set dev->driver to something to prevent standard driver
> > > probing.
> > 
> > > > > It makes it too easy for
> > > > > the user to shoot himself in the foot. For example when the user wants
> > > > > to assign a network card to a guest, but that card is in the same group
> > > > > as the GPU and the screen wents blank when the guest is started.
> > > > > Requiring devices to be unbound one-by-one is better because this way
> > > > > the user always needs to know what he is doing.
> > > > 
> > > > Ok, that's not the usage model I had in mind.  What I'm thinking here
> > > > is that the admin removes groups that will be used in guests from the
> > > > host kernel (probably via boot-time scripts).  The guests will pick
> > > > them up later, so that when a guest quits then restarts, we don't have
> > > > devices appearing transiently in the host.
> > > 
> > > I don't think that model is dynamic enough for our existing use cases.
> > > A user shouldn't need to decide at boot time which devices are going to
> > > be used for what purpose.  It's entirely valid for a user to start up a
> > > VM, decide they want more network performance and reassign a NIC from
> > > host use to guest use.  When they shutdown the VM or hot-unplug it, they
> > > should be able to easily put it back to work on the host.
> > 
> > Well, sure, you can do that too.  But I bet you the more common usage
> > model for production will be to have most of the hardware assigned to
> > a particular guest on a long term basis.
> 
> I don't doubt that that's the common usage, but it's a very poor model
> to aim for and call "good enough".

Nothing in my proposals precludes your usage model, it just makes the
common usage model less problematic, with devices appearing briefly on
the host while the guest is down.  For your usage model it's just one
extra step - using sysfs to explicitly reassign the group to the host
once the guest (or userspace) is done with it.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2012-01-30 23:38 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-15  6:25 [RFC] Device isolation infrastructure v2 David Gibson
2011-12-15  6:25 ` [PATCH 1/3] device_isolation: Infrastructure for managing device isolation groups David Gibson
2011-12-15  6:25 ` [PATCH 2/3] device_isolation: Support isolation on POWER p5ioc2 bridges David Gibson
2011-12-15  6:25 ` [PATCH 3/3] device_isolation: Support isolation on POWER p7ioc (IODA) bridges David Gibson
2011-12-15 18:05 ` [RFC] Device isolation infrastructure v2 Alex Williamson
2011-12-15 22:39   ` Alex Williamson
2011-12-16  1:40   ` David Gibson
2011-12-16  4:49     ` Alex Williamson
2011-12-16  6:00       ` David Gibson
2011-12-16 14:53   ` Joerg Roedel
2011-12-19  0:11     ` David Gibson
2011-12-19 15:41       ` Joerg Roedel
2011-12-21  3:32         ` [Qemu-devel] " David Gibson
2011-12-21  4:30           ` Alex Williamson
2011-12-21  6:12             ` Aaron Fabbri
2012-01-25  3:13             ` David Gibson
2012-01-25 23:44               ` Alex Williamson
2012-01-30 23:22                 ` David Gibson
2011-12-21 16:46           ` Joerg Roedel
2011-12-19 15:46       ` David Woodhouse
2011-12-19 22:31         ` David Gibson
2011-12-19 22:56           ` David Woodhouse
2011-12-20  0:25             ` David Gibson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).