* RFC: Device isolation groups @ 2012-02-01 4:46 David Gibson 2012-02-01 4:46 ` [PATCH 1/3] Device isolation group infrastructure (v3) David Gibson ` (3 more replies) 0 siblings, 4 replies; 18+ messages in thread From: David Gibson @ 2012-02-01 4:46 UTC (permalink / raw) To: dwmw2, iommu Cc: aik, benh, qemu-devel, alex.williamson, joerg.roedel, kvm, linux-kernel This patch series introduces a new infrastructure to the driver core for representing "device isolation groups". That is, groups of devices which can be "isolated" in such a way that the rest of the system can be protected from them, even in the presence of userspace or a guest OS directly driving the devices. Isolation will typically be due to an IOMMU which can safely remap DMA and interrupts coming from these devices. We need to represent whole groups, rather than individual devices, because there are a number of cases where the group can be isolated as a whole, but devices within it cannot be safely isolated from each other - this usually occurs because the IOMMU cannot reliably distinguish which device in the group initiated a transaction. In other words, isolation groups represent the minimum safe granularity for passthrough to guests or userspace. This series provides the core infraustrcture for tracking isolation groups, and example implementations initializing the groups appropriately for two PCI bridges (which include IOMMUs) found on IBM POWER systems. Actually using the group information is not included here, but David Woodhouse has expressed an interest in using a structure like this to represent operations in iommu_ops more correctly. Some tracking of groups is a prerequisite for safe passthrough of devices to guests or userspace, such as done by VFIO. Current VFIO patches use the iommu_ops->device_group mechanism for this. However, that mechanism is awkward, because without an in-kernel concrete representation of groups, enumerating a group requires traversing every device on a given bus type. It also fails to cover some very plausible IOMMU topologies, because its groups cannot span devices on multiple bus types. ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/3] Device isolation group infrastructure (v3) 2012-02-01 4:46 RFC: Device isolation groups David Gibson @ 2012-02-01 4:46 ` David Gibson 2012-02-08 15:27 ` Joerg Roedel 2012-02-01 4:46 ` [PATCH 2/3] device_isolation: Support isolation on POWER p5ioc2 bridges David Gibson ` (2 subsequent siblings) 3 siblings, 1 reply; 18+ messages in thread From: David Gibson @ 2012-02-01 4:46 UTC (permalink / raw) To: dwmw2, iommu Cc: aik, benh, qemu-devel, alex.williamson, joerg.roedel, kvm, linux-kernel In order to safely drive a device with a userspace driver, or to pass it through to a guest system, we must first make sure that the device is isolated in such a way that it cannot interfere with other devices on the system. This isolation is only available on some systems and will generally require an iommu, and might require other support in bridges or other system hardware. Often, it's not possible to isolate every device from every other device in the system. For example, certain PCI/PCIe bridge configurations mean that an iommu cannot reliably distinguish which device behind the bridge initiated a DMA transaction. Similarly some buggy PCI multifunction devices initiate all DMAs as function 0, so the functions cannot be isolated from each other, even if the IOMMU normally allows this. Therefore, the user, and code to allow userspace drivers or guest passthrough, needs a way to determine which devices can be isolated from which others. This patch adds infrastructure to handle this by introducing the concept of a "device isolation group" - a group of devices which can, as a unit, be safely isolated from the rest of the system and therefore can be, as a unit, safely assigned to an unprivileged used or guest. That is, the groups represent the minimum granularity with which devices may be assigned to untrusted components. This code manages groups, but does not create them or allow use of grouped devices by a guest. Creating groups would be done by iommu or bridge drivers, using the interface this patch provides. It's expected that the groups will be used in future by the in-kernel iommu interface, and would also be used by VFIO or other subsystems to allow safe passthrough of devices to userspace or guests. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> --- drivers/base/Kconfig | 3 + drivers/base/Makefile | 1 + drivers/base/base.h | 3 + drivers/base/core.c | 6 ++ drivers/base/device_isolation.c | 184 ++++++++++++++++++++++++++++++++++++++ drivers/base/init.c | 2 + include/linux/device.h | 5 + include/linux/device_isolation.h | 100 +++++++++++++++++++++ 8 files changed, 304 insertions(+), 0 deletions(-) create mode 100644 drivers/base/device_isolation.c create mode 100644 include/linux/device_isolation.h diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index 7be9f79..a52f2db 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -189,4 +189,7 @@ config DMA_SHARED_BUFFER APIs extension; the file's descriptor can then be passed on to other driver. +config DEVICE_ISOLATION + bool "Enable isolating devices for safe pass-through to guests or user space." + endmenu diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 2c8272d..5daef29 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -19,6 +19,7 @@ obj-$(CONFIG_MODULES) += module.o endif obj-$(CONFIG_SYS_HYPERVISOR) += hypervisor.o obj-$(CONFIG_REGMAP) += regmap/ +obj-$(CONFIG_DEVICE_ISOLATION) += device_isolation.o ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG diff --git a/drivers/base/base.h b/drivers/base/base.h index b858dfd..713e168 100644 --- a/drivers/base/base.h +++ b/drivers/base/base.h @@ -25,6 +25,9 @@ * bus_type/class to be statically allocated safely. Nothing outside of the * driver core should ever touch these fields. */ + +#include <linux/device_isolation.h> + struct subsys_private { struct kset subsys; struct kset *devices_kset; diff --git a/drivers/base/core.c b/drivers/base/core.c index 4a67cc0..18edcb1 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -23,6 +23,7 @@ #include <linux/mutex.h> #include <linux/async.h> #include <linux/pm_runtime.h> +#include <linux/device_isolation.h> #include "base.h" #include "power/power.h" @@ -644,6 +645,9 @@ void device_initialize(struct device *dev) lockdep_set_novalidate_class(&dev->mutex); spin_lock_init(&dev->devres_lock); INIT_LIST_HEAD(&dev->devres_head); +#ifdef CONFIG_DEVICE_ISOLATION + dev->di_group = NULL; +#endif device_pm_init(dev); set_dev_node(dev, -1); } @@ -1047,6 +1051,8 @@ int device_add(struct device *dev) class_intf->add_dev(dev, class_intf); mutex_unlock(&dev->class->p->mutex); } + + device_isolation_dev_update_sysfs(dev); done: put_device(dev); return error; diff --git a/drivers/base/device_isolation.c b/drivers/base/device_isolation.c new file mode 100644 index 0000000..4f1f17e --- /dev/null +++ b/drivers/base/device_isolation.c @@ -0,0 +1,184 @@ +/* + * device_isolation.c + * + * Handling of device isolation groups, groups of hardware devices + * which are sufficiently isolated by an IOMMU from the rest of the + * system that they can be safely given (as a unit) to an unprivileged + * user process or guest system to drive. + * + * Copyright (c) 2011 Alexey Kardashevskiy, IBM Corporation + * Copyright (c) 2011 David Gibson, IBM Corporation + * + */ + +#include <linux/fs.h> +#include <linux/slab.h> +#include <linux/device_isolation.h> + +static struct kset *device_isolation_kset; + +struct dig_attribute { + struct attribute attr; + ssize_t (*show)(struct device_isolation_group *group, char *buf); + ssize_t (*store)(struct device_isolation_group *group, const char *buf, + size_t count); +}; + +#define DIG_ATTR(_name, _mode, _show, _store) \ + struct dig_attribute dig_attr_##_name = \ + __ATTR(_name, _mode, _show, _store) + +#define to_dig_attr(_attr) \ + container_of(_attr, struct dig_attribute, attr) + +static ssize_t dig_attr_show(struct kobject *kobj, struct attribute *attr, + char *buf) +{ + struct dig_attribute *dig_attr = to_dig_attr(attr); + struct device_isolation_group *group = + container_of(kobj, struct device_isolation_group, kobj); + ssize_t ret = -EIO; + + if (dig_attr->show) + ret = dig_attr->show(group, buf); + return ret; +} + +static ssize_t dig_attr_store(struct kobject *kobj, struct attribute *attr, + const char *buf, size_t count) +{ + struct dig_attribute *dig_attr = to_dig_attr(attr); + struct device_isolation_group *group = + container_of(kobj, struct device_isolation_group, kobj); + ssize_t ret = -EIO; + + if (dig_attr->store) + ret = dig_attr->store(group, buf, count); + return ret; +} + +static void dig_release(struct kobject *kobj) +{ + /* FIXME: No way for groups to be removed as yet */ + BUG(); +} + +static const struct sysfs_ops dig_sysfs_ops = { + .show = dig_attr_show, + .store = dig_attr_store, +}; + +static struct kobj_type dig_ktype = { + .sysfs_ops = &dig_sysfs_ops, + .release = dig_release, +}; + +int device_isolation_group_init(struct device_isolation_group *group, + const char *fmt, ...) +{ + int ret; + va_list args; + + kobject_init(&group->kobj, &dig_ktype); + mutex_init(&group->mutex); + INIT_LIST_HEAD(&group->devices); + + group->kobj.kset = device_isolation_kset; + + va_start(args, fmt); + ret = kobject_set_name_vargs(&group->kobj, fmt, args); + va_end(args); + if (ret < 0) { + printk(KERN_ERR "device_isolation: " + "kobject_set_name_vargs() failed\n"); + return ret; + } + + ret = kobject_add(&group->kobj, NULL, NULL); + if (ret < 0) { + printk(KERN_ERR "device_isolation: " + "kobject_add() failed for %s\n", + kobject_name(&group->kobj)); + return ret; + } + + +#define CREATE_ATTR(_attr) \ + do { \ + if (sysfs_create_file(&group->kobj, \ + &dig_attr_##_attr.attr) < 0) \ + printk(KERN_WARNING "device_isolation: create \"" \ + #_attr "\" \failed for %s (errno=%d)\n", \ + kobject_name(&group->kobj), ret); \ + } while (0) + +#undef CREATE_ATTR + + printk(KERN_DEBUG "device_isolation: group %s created\n", + kobject_name(&group->kobj)); + + return 0; +} + +void device_isolation_dev_add(struct device_isolation_group *group, + struct device *dev) +{ + printk(KERN_DEBUG "device_isolation: adding device %s to group %s\n", + kobject_name(&dev->kobj), kobject_name(&group->kobj)); + + mutex_lock(&group->mutex); + list_add_tail(&dev->di_list, &group->devices); + dev->di_group = group; + mutex_unlock(&group->mutex); +} + +void device_isolation_dev_remove(struct device *dev) +{ + struct device_isolation_group *group = dev->di_group; + + BUG_ON(!group); + + mutex_lock(&group->mutex); + list_del(&dev->di_list); + mutex_unlock(&group->mutex); +} + +int device_isolation_dev_update_sysfs(struct device *dev) +{ + int ret; + struct device_isolation_group *group = dev->di_group; + + if (!group) + return 0; + + printk(KERN_DEBUG "device_isolation: updating links for %s in " + "group %s\n", kobject_name(&dev->kobj), + kobject_name(&group->kobj)); + + mutex_lock(&group->mutex); + + ret = sysfs_create_link(&dev->kobj, &group->kobj, "device_isolation_group"); + if (0 > ret) + printk(KERN_WARNING "device_isolation: create device_isolation_group " + "link failed for %s -> %s, errno=%i\n", + kobject_name(&dev->kobj), kobject_name(&group->kobj), ret); + + ret = sysfs_create_link(&group->kobj, &dev->kobj, kobject_name(&dev->kobj)); + if (0 > ret) + printk(KERN_WARNING "device_isolation: create " + "link failed for %s -> %s, errno=%i\n", + kobject_name(&dev->kobj), kobject_name(&group->kobj), + ret); + + mutex_unlock(&group->mutex); + + return ret; +} + +int __init device_isolation_init(void) +{ + device_isolation_kset = kset_create_and_add("isolation", NULL, NULL); + if (!device_isolation_kset) + return -ENOMEM; + return 0; +} diff --git a/drivers/base/init.c b/drivers/base/init.c index c16f0b8..e765717 100644 --- a/drivers/base/init.c +++ b/drivers/base/init.c @@ -8,6 +8,7 @@ #include <linux/device.h> #include <linux/init.h> #include <linux/memory.h> +#include <linux/device_isolation.h> #include "base.h" @@ -24,6 +25,7 @@ void __init driver_init(void) devices_init(); buses_init(); classes_init(); + device_isolation_init(); firmware_init(); hypervisor_init(); diff --git a/include/linux/device.h b/include/linux/device.h index b63fb39..9a2b472 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -667,6 +667,11 @@ struct device { struct dma_coherent_mem *dma_mem; /* internal for coherent mem override */ +#ifdef CONFIG_DEVICE_ISOLATION + struct device_isolation_group *di_group; + struct list_head di_list; +#endif + /* arch specific additions */ struct dev_archdata archdata; diff --git a/include/linux/device_isolation.h b/include/linux/device_isolation.h new file mode 100644 index 0000000..2f0afdc --- /dev/null +++ b/include/linux/device_isolation.h @@ -0,0 +1,100 @@ +#ifndef _DEVICE_ISOLATION_H_ +#define _DEVICE_ISOLATION_H_ + +#include <linux/kobject.h> +#include <linux/list.h> +#include <linux/mutex.h> +#include <linux/device.h> + +struct device_isolation_binder { + const char *name; +}; + +struct device_isolation_group { + struct kobject kobj; + struct list_head devices; + struct mutex mutex; +}; + +#ifdef CONFIG_DEVICE_ISOLATION + +int __init device_isolation_init(void); + +int device_isolation_group_init(struct device_isolation_group *group, + const char *fmt, ...); + +void device_isolation_dev_add(struct device_isolation_group *group, + struct device *dev); +void device_isolation_dev_remove(struct device *dev); +int device_isolation_dev_update_sysfs(struct device *dev); + +int device_isolation_bind(struct device_isolation_group *group, + struct device_isolation_binder *binder, + void *priv); +void device_isolation_unbind(struct device_isolation_group *group, + struct device_isolation_binder *binder); + +#else /* CONFIG_DEVICE_ISOLATION */ + +static inline int __init device_isolation_init(void) +{ + return 0; +} + +static inline +int device_isolation_group_init(struct device_isolation_group *group, + const char *fmt, ...) +{ + return 0; +} + +static inline +struct isolation_group *device_isolation_group_new(const char *name) +{ + return NULL; +} + +static inline +void device_isolation_dev_add(struct device_isolation_group *group, + struct device *dev) +{ +} + +static inline +void device_isolation_dev_remove(struct device *dev) +{ +} + +static inline int device_isolation_dev_update_sysfs(struct device *dev) +{ + return 0; +} + +static inline +int device_isolation_bind(struct device_isolation_group *group, + struct device_isolation_binder *binder, + void *priv) +{ + return -ENOSYS; +} + +static inline +void device_isolation_unbind(struct device_isolation_group *group, + struct device_isolation_binder *binder) +{ + BUG(); +} + +#endif /* CONFIG_DEVICE_ISOLATION */ + +static inline +struct device_isolation_group *device_isolation_group(struct device *dev) +{ +#ifdef CONFIG_DEVICE_ISOLATION + return dev->di_group; +#else /* CONFIG_DEVICE_ISOLATION */ + return NULL; +#endif /* CONFIG_DEVICE_ISOLATION */ +} + +#endif /* _DEVICE_ISOLATION_H_ */ -- 1.7.8.3 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] Device isolation group infrastructure (v3) 2012-02-01 4:46 ` [PATCH 1/3] Device isolation group infrastructure (v3) David Gibson @ 2012-02-08 15:27 ` Joerg Roedel 2012-02-08 21:39 ` Benjamin Herrenschmidt ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Joerg Roedel @ 2012-02-08 15:27 UTC (permalink / raw) To: David Gibson Cc: dwmw2, iommu, aik, benh, qemu-devel, alex.williamson, kvm, linux-kernel On Wed, Feb 01, 2012 at 03:46:52PM +1100, David Gibson wrote: > In order to safely drive a device with a userspace driver, or to pass > it through to a guest system, we must first make sure that the device > is isolated in such a way that it cannot interfere with other devices > on the system. This isolation is only available on some systems and > will generally require an iommu, and might require other support in > bridges or other system hardware. > > Often, it's not possible to isolate every device from every other > device in the system. For example, certain PCI/PCIe bridge > configurations mean that an iommu cannot reliably distinguish which > device behind the bridge initiated a DMA transaction. Similarly some > buggy PCI multifunction devices initiate all DMAs as function 0, so > the functions cannot be isolated from each other, even if the IOMMU > normally allows this. > > Therefore, the user, and code to allow userspace drivers or guest > passthrough, needs a way to determine which devices can be isolated > from which others. This patch adds infrastructure to handle this by > introducing the concept of a "device isolation group" - a group of > devices which can, as a unit, be safely isolated from the rest of the > system and therefore can be, as a unit, safely assigned to an > unprivileged used or guest. That is, the groups represent the minimum > granularity with which devices may be assigned to untrusted > components. > > This code manages groups, but does not create them or allow use of > grouped devices by a guest. Creating groups would be done by iommu or > bridge drivers, using the interface this patch provides. It's > expected that the groups will be used in future by the in-kernel iommu > interface, and would also be used by VFIO or other subsystems to allow > safe passthrough of devices to userspace or guests. > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au> > --- > drivers/base/Kconfig | 3 + > drivers/base/Makefile | 1 + > drivers/base/base.h | 3 + > drivers/base/core.c | 6 ++ > drivers/base/device_isolation.c | 184 ++++++++++++++++++++++++++++++++++++++ > drivers/base/init.c | 2 + > include/linux/device.h | 5 + > include/linux/device_isolation.h | 100 +++++++++++++++++++++ Again, device grouping is done by the IOMMU drivers, so this all belongs into the generic iommu-code rather than the driver core. I think it makes sense to introduce a device->iommu pointer which depends on CONFIG_IOMMU_API and put the group information into it. This also has the benefit that we can consolidate all the device->arch.iommu pointers into device->iommu as well. > 8 files changed, 304 insertions(+), 0 deletions(-) > create mode 100644 drivers/base/device_isolation.c > create mode 100644 include/linux/device_isolation.h > > diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig > index 7be9f79..a52f2db 100644 > --- a/drivers/base/Kconfig > +++ b/drivers/base/Kconfig > @@ -189,4 +189,7 @@ config DMA_SHARED_BUFFER > APIs extension; the file's descriptor can then be passed on to other > driver. > > +config DEVICE_ISOLATION > + bool "Enable isolating devices for safe pass-through to guests or user space." > + No need for a config option. When IOMMU drivers are enabled we also want the group code to be active. Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] Device isolation group infrastructure (v3) 2012-02-08 15:27 ` Joerg Roedel @ 2012-02-08 21:39 ` Benjamin Herrenschmidt 2012-02-09 11:28 ` Joerg Roedel 2012-02-08 21:43 ` Benjamin Herrenschmidt 2012-02-09 1:40 ` David Gibson 2 siblings, 1 reply; 18+ messages in thread From: Benjamin Herrenschmidt @ 2012-02-08 21:39 UTC (permalink / raw) To: Joerg Roedel Cc: David Gibson, dwmw2, iommu, aik, qemu-devel, alex.williamson, kvm, linux-kernel On Wed, 2012-02-08 at 16:27 +0100, Joerg Roedel wrote: > Again, device grouping is done by the IOMMU drivers, so this all > belongs > into the generic iommu-code rather than the driver core. Except that there isn't really a "generic iommu code"... discovery, initialization & matching of iommu vs. devices etc... that's all implemented in the arch specific iommu code. Cheers, Ben. > I think it makes sense to introduce a device->iommu pointer which > depends on CONFIG_IOMMU_API and put the group information into it. > This also has the benefit that we can consolidate all the > device->arch.iommu pointers into device->iommu as well. > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] Device isolation group infrastructure (v3) 2012-02-08 21:39 ` Benjamin Herrenschmidt @ 2012-02-09 11:28 ` Joerg Roedel 2012-02-10 0:21 ` David Gibson 0 siblings, 1 reply; 18+ messages in thread From: Joerg Roedel @ 2012-02-09 11:28 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: David Gibson, dwmw2, iommu, aik, qemu-devel, alex.williamson, kvm, linux-kernel On Thu, Feb 09, 2012 at 08:39:28AM +1100, Benjamin Herrenschmidt wrote: > On Wed, 2012-02-08 at 16:27 +0100, Joerg Roedel wrote: > > Again, device grouping is done by the IOMMU drivers, so this all > > belongs > > into the generic iommu-code rather than the driver core. > > Except that there isn't really a "generic iommu code"... discovery, > initialization & matching of iommu vs. devices etc... that's all > implemented in the arch specific iommu code. The whole point of moving the iommu drivers to drivers/iommu was to factor out common code. We are not where we want to be yet but the goal is to move more code to the generic part. For the group-code this means that the generic code should iterate over all devices on a bus and build up group structures based on isolation information provided by the arch specific code. Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] Device isolation group infrastructure (v3) 2012-02-09 11:28 ` Joerg Roedel @ 2012-02-10 0:21 ` David Gibson 0 siblings, 0 replies; 18+ messages in thread From: David Gibson @ 2012-02-10 0:21 UTC (permalink / raw) To: Joerg Roedel Cc: Benjamin Herrenschmidt, dwmw2, iommu, aik, qemu-devel, alex.williamson, kvm, linux-kernel On Thu, Feb 09, 2012 at 12:28:05PM +0100, Joerg Roedel wrote: > On Thu, Feb 09, 2012 at 08:39:28AM +1100, Benjamin Herrenschmidt wrote: > > On Wed, 2012-02-08 at 16:27 +0100, Joerg Roedel wrote: > > > Again, device grouping is done by the IOMMU drivers, so this all > > > belongs > > > into the generic iommu-code rather than the driver core. > > > > Except that there isn't really a "generic iommu code"... discovery, > > initialization & matching of iommu vs. devices etc... that's all > > implemented in the arch specific iommu code. > > The whole point of moving the iommu drivers to drivers/iommu was to > factor out common code. We are not where we want to be yet but the goal > is to move more code to the generic part. > > For the group-code this means that the generic code should iterate over > all devices on a bus and build up group structures based on isolation > information provided by the arch specific code. And how exactly do you suggest it provide that information. I really can't see how an iommu driver would specify its isolation constraints generally enough, except in the form of code and then we're back to where we are now. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] Device isolation group infrastructure (v3) 2012-02-08 15:27 ` Joerg Roedel 2012-02-08 21:39 ` Benjamin Herrenschmidt @ 2012-02-08 21:43 ` Benjamin Herrenschmidt 2012-02-09 1:40 ` David Gibson 2 siblings, 0 replies; 18+ messages in thread From: Benjamin Herrenschmidt @ 2012-02-08 21:43 UTC (permalink / raw) To: Joerg Roedel Cc: David Gibson, dwmw2, iommu, aik, qemu-devel, alex.williamson, kvm, linux-kernel On Wed, 2012-02-08 at 16:27 +0100, Joerg Roedel wrote: > Again, device grouping is done by the IOMMU drivers, so this all > belongs > into the generic iommu-code rather than the driver core. > > I think it makes sense to introduce a device->iommu pointer which > depends on CONFIG_IOMMU_API and put the group information into it. > This also has the benefit that we can consolidate all the > device->arch.iommu pointers into device->iommu as well. ... and I pressed sent too quickly before. So not only that, but these patches are simply a mechanism to expose those groups to userspace and allow ownership (ie synchronize with the attachment/detachment of kernel drivers). So this code totally belongs in the driver core. It does -not- address the issue of deciding how the groups are formed, for this, it expects the information to be provided by the arch iommu layer and we'll have to work on that. The way iommu grouping work is too dependent on a given HW setup, you can't really do that generically. Yes, some factors are going to be common, such as the already mentioned ricoh chip, but I think the best we can do here is provide quirks for the iommu code to use. There are capacity limits on how bdfn filtering works on bridges, either CAM size or simply how it is arranged (ie on power for example, I can chose to identify a function, a device, or a range of bus numbers but in the later case it has to be an aligned power of two up to 32), etc... I wouldn't try to solve that generically just yet. Cheers, Ben. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] Device isolation group infrastructure (v3) 2012-02-08 15:27 ` Joerg Roedel 2012-02-08 21:39 ` Benjamin Herrenschmidt 2012-02-08 21:43 ` Benjamin Herrenschmidt @ 2012-02-09 1:40 ` David Gibson 2 siblings, 0 replies; 18+ messages in thread From: David Gibson @ 2012-02-09 1:40 UTC (permalink / raw) To: Joerg Roedel Cc: dwmw2, iommu, aik, benh, qemu-devel, alex.williamson, kvm, linux-kernel On Wed, Feb 08, 2012 at 04:27:48PM +0100, Joerg Roedel wrote: > On Wed, Feb 01, 2012 at 03:46:52PM +1100, David Gibson wrote: > > In order to safely drive a device with a userspace driver, or to pass > > it through to a guest system, we must first make sure that the device > > is isolated in such a way that it cannot interfere with other devices > > on the system. This isolation is only available on some systems and > > will generally require an iommu, and might require other support in > > bridges or other system hardware. > > > > Often, it's not possible to isolate every device from every other > > device in the system. For example, certain PCI/PCIe bridge > > configurations mean that an iommu cannot reliably distinguish which > > device behind the bridge initiated a DMA transaction. Similarly some > > buggy PCI multifunction devices initiate all DMAs as function 0, so > > the functions cannot be isolated from each other, even if the IOMMU > > normally allows this. > > > > Therefore, the user, and code to allow userspace drivers or guest > > passthrough, needs a way to determine which devices can be isolated > > from which others. This patch adds infrastructure to handle this by > > introducing the concept of a "device isolation group" - a group of > > devices which can, as a unit, be safely isolated from the rest of the > > system and therefore can be, as a unit, safely assigned to an > > unprivileged used or guest. That is, the groups represent the minimum > > granularity with which devices may be assigned to untrusted > > components. > > > > This code manages groups, but does not create them or allow use of > > grouped devices by a guest. Creating groups would be done by iommu or > > bridge drivers, using the interface this patch provides. It's > > expected that the groups will be used in future by the in-kernel iommu > > interface, and would also be used by VFIO or other subsystems to allow > > safe passthrough of devices to userspace or guests. > > > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au> > > --- > > drivers/base/Kconfig | 3 + > > drivers/base/Makefile | 1 + > > drivers/base/base.h | 3 + > > drivers/base/core.c | 6 ++ > > drivers/base/device_isolation.c | 184 ++++++++++++++++++++++++++++++++++++++ > > drivers/base/init.c | 2 + > > include/linux/device.h | 5 + > > include/linux/device_isolation.h | 100 +++++++++++++++++++++ > > Again, device grouping is done by the IOMMU drivers, so this all belongs > into the generic iommu-code rather than the driver core. > > I think it makes sense to introduce a device->iommu pointer which > depends on CONFIG_IOMMU_API and put the group information into it. > This also has the benefit that we can consolidate all the > device->arch.iommu pointers into device->iommu as well. Well, not quite. In the two example setups in the subsequent patches the grouping is done by the bridge driver, which in these cases is not IOMMU_API aware. They probably should become so, but that's another project - and relies on the IOMMU_API becoming group aware. Note that although iommus are the main source of group constraints, they're not necessarily the only one. Bridge error isolation semantics may also play a part, for one. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2/3] device_isolation: Support isolation on POWER p5ioc2 bridges 2012-02-01 4:46 RFC: Device isolation groups David Gibson 2012-02-01 4:46 ` [PATCH 1/3] Device isolation group infrastructure (v3) David Gibson @ 2012-02-01 4:46 ` David Gibson 2012-02-01 18:58 ` Alex Williamson 2012-02-01 4:46 ` [PATCH 3/3] device_isolation: Support isolation on POWER p7ioc (IODA) bridges David Gibson 2012-02-01 20:08 ` RFC: Device isolation groups Alex Williamson 3 siblings, 1 reply; 18+ messages in thread From: David Gibson @ 2012-02-01 4:46 UTC (permalink / raw) To: dwmw2, iommu Cc: aik, benh, qemu-devel, alex.williamson, joerg.roedel, kvm, linux-kernel This patch adds code to the code for the powernv platform to create and populate isolation groups on hardware using the p5ioc2 PCI host bridge used on some IBM POWER systems. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> --- arch/powerpc/platforms/powernv/pci-p5ioc2.c | 14 +++++++++++++- arch/powerpc/platforms/powernv/pci.h | 3 +++ 2 files changed, 16 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-p5ioc2.c b/arch/powerpc/platforms/powernv/pci-p5ioc2.c index 2649677..e5bb3a6 100644 --- a/arch/powerpc/platforms/powernv/pci-p5ioc2.c +++ b/arch/powerpc/platforms/powernv/pci-p5ioc2.c @@ -20,6 +20,7 @@ #include <linux/irq.h> #include <linux/io.h> #include <linux/msi.h> +#include <linux/device_isolation.h> #include <asm/sections.h> #include <asm/io.h> @@ -88,10 +89,21 @@ static void pnv_pci_init_p5ioc2_msis(struct pnv_phb *phb) { } static void __devinit pnv_pci_p5ioc2_dma_dev_setup(struct pnv_phb *phb, struct pci_dev *pdev) { - if (phb->p5ioc2.iommu_table.it_map == NULL) + if (phb->p5ioc2.iommu_table.it_map == NULL) { iommu_init_table(&phb->p5ioc2.iommu_table, phb->hose->node); +#ifdef CONFIG_DEVICE_ISOLATION + phb->p5ioc2.di_group = kzalloc(sizeof(*(phb->p5ioc2.di_group)), + GFP_KERNEL); + BUG_ON(!phb->p5ioc2.di_group || + (device_isolation_group_init(phb->p5ioc2.di_group, + "p5ioc2:%llx", phb->opal_id) < 0)); +#endif + } set_iommu_table_base(&pdev->dev, &phb->p5ioc2.iommu_table); +#ifdef CONFIG_DEVICE_ISOLATION + device_isolation_dev_add(phb->p5ioc2.di_group, &pdev->dev); +#endif } static void __init pnv_pci_init_p5ioc2_phb(struct device_node *np, diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h index 8bc4796..64ede1e 100644 --- a/arch/powerpc/platforms/powernv/pci.h +++ b/arch/powerpc/platforms/powernv/pci.h @@ -87,6 +87,9 @@ struct pnv_phb { union { struct { struct iommu_table iommu_table; +#ifdef CONFIG_DEVICE_ISOLATION + struct device_isolation_group *di_group; +#endif } p5ioc2; struct { -- 1.7.8.3 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 2/3] device_isolation: Support isolation on POWER p5ioc2 bridges 2012-02-01 4:46 ` [PATCH 2/3] device_isolation: Support isolation on POWER p5ioc2 bridges David Gibson @ 2012-02-01 18:58 ` Alex Williamson 2012-02-01 19:15 ` Alex Williamson 0 siblings, 1 reply; 18+ messages in thread From: Alex Williamson @ 2012-02-01 18:58 UTC (permalink / raw) To: David Gibson Cc: dwmw2, iommu, aik, benh, qemu-devel, joerg.roedel, kvm, linux-kernel On Wed, 2012-02-01 at 15:46 +1100, David Gibson wrote: > This patch adds code to the code for the powernv platform to create > and populate isolation groups on hardware using the p5ioc2 PCI host > bridge used on some IBM POWER systems. > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au> > --- > arch/powerpc/platforms/powernv/pci-p5ioc2.c | 14 +++++++++++++- > arch/powerpc/platforms/powernv/pci.h | 3 +++ > 2 files changed, 16 insertions(+), 1 deletions(-) > > diff --git a/arch/powerpc/platforms/powernv/pci-p5ioc2.c b/arch/powerpc/platforms/powernv/pci-p5ioc2.c > index 2649677..e5bb3a6 100644 > --- a/arch/powerpc/platforms/powernv/pci-p5ioc2.c > +++ b/arch/powerpc/platforms/powernv/pci-p5ioc2.c > @@ -20,6 +20,7 @@ > #include <linux/irq.h> > #include <linux/io.h> > #include <linux/msi.h> > +#include <linux/device_isolation.h> > > #include <asm/sections.h> > #include <asm/io.h> > @@ -88,10 +89,21 @@ static void pnv_pci_init_p5ioc2_msis(struct pnv_phb *phb) { } > static void __devinit pnv_pci_p5ioc2_dma_dev_setup(struct pnv_phb *phb, > struct pci_dev *pdev) > { > - if (phb->p5ioc2.iommu_table.it_map == NULL) > + if (phb->p5ioc2.iommu_table.it_map == NULL) { > iommu_init_table(&phb->p5ioc2.iommu_table, phb->hose->node); > +#ifdef CONFIG_DEVICE_ISOLATION > + phb->p5ioc2.di_group = kzalloc(sizeof(*(phb->p5ioc2.di_group)), > + GFP_KERNEL); > + BUG_ON(!phb->p5ioc2.di_group || > + (device_isolation_group_init(phb->p5ioc2.di_group, > + "p5ioc2:%llx", phb->opal_id) < 0)); > +#endif Hmm, it's really unfortunate that this is architected so we need to surround everything in #ifdefs even though we have stub functions defined. > + } > > set_iommu_table_base(&pdev->dev, &phb->p5ioc2.iommu_table); > +#ifdef CONFIG_DEVICE_ISOLATION > + device_isolation_dev_add(phb->p5ioc2.di_group, &pdev->dev); > +#endif > } > > static void __init pnv_pci_init_p5ioc2_phb(struct device_node *np, > diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h > index 8bc4796..64ede1e 100644 > --- a/arch/powerpc/platforms/powernv/pci.h > +++ b/arch/powerpc/platforms/powernv/pci.h > @@ -87,6 +87,9 @@ struct pnv_phb { > union { > struct { > struct iommu_table iommu_table; > +#ifdef CONFIG_DEVICE_ISOLATION > + struct device_isolation_group *di_group; > +#endif > } p5ioc2; > > struct { ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/3] device_isolation: Support isolation on POWER p5ioc2 bridges 2012-02-01 18:58 ` Alex Williamson @ 2012-02-01 19:15 ` Alex Williamson 0 siblings, 0 replies; 18+ messages in thread From: Alex Williamson @ 2012-02-01 19:15 UTC (permalink / raw) To: David Gibson Cc: dwmw2, iommu, aik, benh, qemu-devel, joerg.roedel, kvm, linux-kernel On Wed, 2012-02-01 at 11:58 -0700, Alex Williamson wrote: > On Wed, 2012-02-01 at 15:46 +1100, David Gibson wrote: > > This patch adds code to the code for the powernv platform to create > > and populate isolation groups on hardware using the p5ioc2 PCI host > > bridge used on some IBM POWER systems. > > > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au> > > --- > > arch/powerpc/platforms/powernv/pci-p5ioc2.c | 14 +++++++++++++- > > arch/powerpc/platforms/powernv/pci.h | 3 +++ > > 2 files changed, 16 insertions(+), 1 deletions(-) > > > > diff --git a/arch/powerpc/platforms/powernv/pci-p5ioc2.c b/arch/powerpc/platforms/powernv/pci-p5ioc2.c > > index 2649677..e5bb3a6 100644 > > --- a/arch/powerpc/platforms/powernv/pci-p5ioc2.c > > +++ b/arch/powerpc/platforms/powernv/pci-p5ioc2.c > > @@ -20,6 +20,7 @@ > > #include <linux/irq.h> > > #include <linux/io.h> > > #include <linux/msi.h> > > +#include <linux/device_isolation.h> > > > > #include <asm/sections.h> > > #include <asm/io.h> > > @@ -88,10 +89,21 @@ static void pnv_pci_init_p5ioc2_msis(struct pnv_phb *phb) { } > > static void __devinit pnv_pci_p5ioc2_dma_dev_setup(struct pnv_phb *phb, > > struct pci_dev *pdev) > > { > > - if (phb->p5ioc2.iommu_table.it_map == NULL) > > + if (phb->p5ioc2.iommu_table.it_map == NULL) { > > iommu_init_table(&phb->p5ioc2.iommu_table, phb->hose->node); > > +#ifdef CONFIG_DEVICE_ISOLATION > > + phb->p5ioc2.di_group = kzalloc(sizeof(*(phb->p5ioc2.di_group)), > > + GFP_KERNEL); > > + BUG_ON(!phb->p5ioc2.di_group || > > + (device_isolation_group_init(phb->p5ioc2.di_group, > > + "p5ioc2:%llx", phb->opal_id) < 0)); > > +#endif > > Hmm, it's really unfortunate that this is architected so we need to > surround everything in #ifdefs even though we have stub functions > defined. I think maybe we want: #ifdef CONFIG_DEVICE_ISOLATION struct device_isolation_group *device_isolation_create_group(void) { struct device_isolation_group *di_group; di_group = kzalloc(sizeof(*di_group), GFP_KERNEL); if (!di_group) return ERR_PTR(-ENOMEM); return di_group; } #else struct device_isolation_group *device_isolation_create_group(void) { return NULL; } #endif Then we can do: phb->p5ioc2.di_group = device_isolation_create_group(); BUG_ON(IS_ERR(phb->p5ioc2.di_group) || (device_isolation_group_init(phb->p5ioc2.di_group, ... (We pass NULL to the stubs, but that's ok) > > + } > > > > set_iommu_table_base(&pdev->dev, &phb->p5ioc2.iommu_table); > > +#ifdef CONFIG_DEVICE_ISOLATION > > + device_isolation_dev_add(phb->p5ioc2.di_group, &pdev->dev); > > +#endif > > } > > > > static void __init pnv_pci_init_p5ioc2_phb(struct device_node *np, > > diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h > > index 8bc4796..64ede1e 100644 > > --- a/arch/powerpc/platforms/powernv/pci.h > > +++ b/arch/powerpc/platforms/powernv/pci.h > > @@ -87,6 +87,9 @@ struct pnv_phb { > > union { > > struct { > > struct iommu_table iommu_table; > > +#ifdef CONFIG_DEVICE_ISOLATION > > + struct device_isolation_group *di_group; > > +#endif > > } p5ioc2; > > > > struct { > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 3/3] device_isolation: Support isolation on POWER p7ioc (IODA) bridges 2012-02-01 4:46 RFC: Device isolation groups David Gibson 2012-02-01 4:46 ` [PATCH 1/3] Device isolation group infrastructure (v3) David Gibson 2012-02-01 4:46 ` [PATCH 2/3] device_isolation: Support isolation on POWER p5ioc2 bridges David Gibson @ 2012-02-01 4:46 ` David Gibson 2012-02-01 19:17 ` Alex Williamson 2012-02-01 20:08 ` RFC: Device isolation groups Alex Williamson 3 siblings, 1 reply; 18+ messages in thread From: David Gibson @ 2012-02-01 4:46 UTC (permalink / raw) To: dwmw2, iommu Cc: aik, benh, qemu-devel, alex.williamson, joerg.roedel, kvm, linux-kernel This patch adds code to the code for the powernv platform to create and populate isolation groups on hardware using the p7ioc (aka IODA) PCI host bridge used on some IBM POWER systems. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> --- arch/powerpc/platforms/powernv/pci-ioda.c | 18 ++++++++++++++++-- arch/powerpc/platforms/powernv/pci.h | 6 ++++++ 2 files changed, 22 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 5e155df..4648475 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -20,6 +20,7 @@ #include <linux/irq.h> #include <linux/io.h> #include <linux/msi.h> +#include <linux/device_isolation.h> #include <asm/sections.h> #include <asm/io.h> @@ -877,6 +878,9 @@ static void __devinit pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe, set_iommu_table_base(&dev->dev, &pe->tce32_table); if (dev->subordinate) pnv_ioda_setup_bus_dma(pe, dev->subordinate); +#ifdef CONFIG_DEVICE_ISOLATION + device_isolation_dev_add(&pe->di_group, &dev->dev); +#endif } } @@ -957,11 +961,21 @@ static void __devinit pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb, } iommu_init_table(tbl, phb->hose->node); - if (pe->pdev) +#ifdef CONFIG_DEVICE_ISOLATION + BUG_ON(device_isolation_group_init(&pe->di_group, "ioda:rid%x-pe%x", + pe->rid, pe->pe_number) < 0); +#endif + + if (pe->pdev) { set_iommu_table_base(&pe->pdev->dev, tbl); - else +#ifdef CONFIG_DEVICE_ISOLATION + device_isolation_dev_add(&pe->di_group, &pe->pdev->dev); +#endif + } else pnv_ioda_setup_bus_dma(pe, pe->pbus); + + return; fail: /* XXX Failure: Try to fallback to 64-bit only ? */ diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h index 64ede1e..3e282b7 100644 --- a/arch/powerpc/platforms/powernv/pci.h +++ b/arch/powerpc/platforms/powernv/pci.h @@ -1,6 +1,8 @@ #ifndef __POWERNV_PCI_H #define __POWERNV_PCI_H +#include <linux/device_isolation.h> + struct pci_dn; enum pnv_phb_type { @@ -60,6 +62,10 @@ struct pnv_ioda_pe { /* Link in list of PE#s */ struct list_head link; + +#ifdef CONFIG_DEVICE_ISOLATION + struct device_isolation_group di_group; +#endif }; struct pnv_phb { -- 1.7.8.3 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] device_isolation: Support isolation on POWER p7ioc (IODA) bridges 2012-02-01 4:46 ` [PATCH 3/3] device_isolation: Support isolation on POWER p7ioc (IODA) bridges David Gibson @ 2012-02-01 19:17 ` Alex Williamson 2012-02-02 0:23 ` David Gibson 0 siblings, 1 reply; 18+ messages in thread From: Alex Williamson @ 2012-02-01 19:17 UTC (permalink / raw) To: David Gibson Cc: dwmw2, iommu, aik, benh, qemu-devel, joerg.roedel, kvm, linux-kernel On Wed, 2012-02-01 at 15:46 +1100, David Gibson wrote: > This patch adds code to the code for the powernv platform to create > and populate isolation groups on hardware using the p7ioc (aka IODA) PCI host > bridge used on some IBM POWER systems. > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au> > --- > arch/powerpc/platforms/powernv/pci-ioda.c | 18 ++++++++++++++++-- > arch/powerpc/platforms/powernv/pci.h | 6 ++++++ > 2 files changed, 22 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c > index 5e155df..4648475 100644 > --- a/arch/powerpc/platforms/powernv/pci-ioda.c > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c > @@ -20,6 +20,7 @@ > #include <linux/irq.h> > #include <linux/io.h> > #include <linux/msi.h> > +#include <linux/device_isolation.h> > > #include <asm/sections.h> > #include <asm/io.h> > @@ -877,6 +878,9 @@ static void __devinit pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe, > set_iommu_table_base(&dev->dev, &pe->tce32_table); > if (dev->subordinate) > pnv_ioda_setup_bus_dma(pe, dev->subordinate); > +#ifdef CONFIG_DEVICE_ISOLATION > + device_isolation_dev_add(&pe->di_group, &dev->dev); > +#endif > } > } > > @@ -957,11 +961,21 @@ static void __devinit pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb, > } > iommu_init_table(tbl, phb->hose->node); > > - if (pe->pdev) > +#ifdef CONFIG_DEVICE_ISOLATION > + BUG_ON(device_isolation_group_init(&pe->di_group, "ioda:rid%x-pe%x", > + pe->rid, pe->pe_number) < 0); > +#endif > + > + if (pe->pdev) { > set_iommu_table_base(&pe->pdev->dev, tbl); > - else > +#ifdef CONFIG_DEVICE_ISOLATION > + device_isolation_dev_add(&pe->di_group, &pe->pdev->dev); > +#endif > + } else > pnv_ioda_setup_bus_dma(pe, pe->pbus); > Blech, #ifdefs. > + > + > return; > fail: > /* XXX Failure: Try to fallback to 64-bit only ? */ > diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h > index 64ede1e..3e282b7 100644 > --- a/arch/powerpc/platforms/powernv/pci.h > +++ b/arch/powerpc/platforms/powernv/pci.h > @@ -1,6 +1,8 @@ > #ifndef __POWERNV_PCI_H > #define __POWERNV_PCI_H > > +#include <linux/device_isolation.h> > + > struct pci_dn; > > enum pnv_phb_type { > @@ -60,6 +62,10 @@ struct pnv_ioda_pe { > > /* Link in list of PE#s */ > struct list_head link; > + > +#ifdef CONFIG_DEVICE_ISOLATION > + struct device_isolation_group di_group; > +#endif Embedding the struct means we need to know the size, which means we can't get rid of the #ifdef. Probably better to use a pointer if we don't mind adding a few bytes in the #ifndef case. Thanks, Alex > }; > > struct pnv_phb { ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] device_isolation: Support isolation on POWER p7ioc (IODA) bridges 2012-02-01 19:17 ` Alex Williamson @ 2012-02-02 0:23 ` David Gibson 0 siblings, 0 replies; 18+ messages in thread From: David Gibson @ 2012-02-02 0:23 UTC (permalink / raw) To: Alex Williamson Cc: dwmw2, iommu, aik, benh, qemu-devel, joerg.roedel, kvm, linux-kernel On Wed, Feb 01, 2012 at 12:17:05PM -0700, Alex Williamson wrote: > On Wed, 2012-02-01 at 15:46 +1100, David Gibson wrote: > > This patch adds code to the code for the powernv platform to create > > and populate isolation groups on hardware using the p7ioc (aka IODA) PCI host > > bridge used on some IBM POWER systems. > > > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au> > > --- > > arch/powerpc/platforms/powernv/pci-ioda.c | 18 ++++++++++++++++-- > > arch/powerpc/platforms/powernv/pci.h | 6 ++++++ > > 2 files changed, 22 insertions(+), 2 deletions(-) > > > > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c > > index 5e155df..4648475 100644 > > --- a/arch/powerpc/platforms/powernv/pci-ioda.c > > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c > > @@ -20,6 +20,7 @@ > > #include <linux/irq.h> > > #include <linux/io.h> > > #include <linux/msi.h> > > +#include <linux/device_isolation.h> > > > > #include <asm/sections.h> > > #include <asm/io.h> > > @@ -877,6 +878,9 @@ static void __devinit pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe, > > set_iommu_table_base(&dev->dev, &pe->tce32_table); > > if (dev->subordinate) > > pnv_ioda_setup_bus_dma(pe, dev->subordinate); > > +#ifdef CONFIG_DEVICE_ISOLATION > > + device_isolation_dev_add(&pe->di_group, &dev->dev); > > +#endif > > } > > } > > > > @@ -957,11 +961,21 @@ static void __devinit pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb, > > } > > iommu_init_table(tbl, phb->hose->node); > > > > - if (pe->pdev) > > +#ifdef CONFIG_DEVICE_ISOLATION > > + BUG_ON(device_isolation_group_init(&pe->di_group, "ioda:rid%x-pe%x", > > + pe->rid, pe->pe_number) < 0); > > +#endif > > + > > + if (pe->pdev) { > > set_iommu_table_base(&pe->pdev->dev, tbl); > > - else > > +#ifdef CONFIG_DEVICE_ISOLATION > > + device_isolation_dev_add(&pe->di_group, &pe->pdev->dev); > > +#endif > > + } else > > pnv_ioda_setup_bus_dma(pe, pe->pbus); > > Blech, #ifdefs. Hm, yeah. The problem is the di_group member not even existing when !DEVICE_ISOLATION. Might be able to avoid that with an empty structure in that case. > > + > > + > > return; > > fail: > > /* XXX Failure: Try to fallback to 64-bit only ? */ > > diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h > > index 64ede1e..3e282b7 100644 > > --- a/arch/powerpc/platforms/powernv/pci.h > > +++ b/arch/powerpc/platforms/powernv/pci.h > > @@ -1,6 +1,8 @@ > > #ifndef __POWERNV_PCI_H > > #define __POWERNV_PCI_H > > > > +#include <linux/device_isolation.h> > > + > > struct pci_dn; > > > > enum pnv_phb_type { > > @@ -60,6 +62,10 @@ struct pnv_ioda_pe { > > > > /* Link in list of PE#s */ > > struct list_head link; > > + > > +#ifdef CONFIG_DEVICE_ISOLATION > > + struct device_isolation_group di_group; > > +#endif > > Embedding the struct means we need to know the size, which means we > can't get rid of the #ifdef. Probably better to use a pointer if we > don't mind adding a few bytes in the #ifndef case. Thanks, I've been back and forth a few types on this, and I've convinced myself that allowing the group structure to be embedded is a better idea. It's a particular help when you need to construct one from platform or bridge init code that runs before mem_init_done. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: RFC: Device isolation groups 2012-02-01 4:46 RFC: Device isolation groups David Gibson ` (2 preceding siblings ...) 2012-02-01 4:46 ` [PATCH 3/3] device_isolation: Support isolation on POWER p7ioc (IODA) bridges David Gibson @ 2012-02-01 20:08 ` Alex Williamson 2012-02-02 1:24 ` David Gibson 3 siblings, 1 reply; 18+ messages in thread From: Alex Williamson @ 2012-02-01 20:08 UTC (permalink / raw) To: David Gibson Cc: dwmw2, iommu, aik, benh, qemu-devel, joerg.roedel, kvm, linux-kernel On Wed, 2012-02-01 at 15:46 +1100, David Gibson wrote: > This patch series introduces a new infrastructure to the driver core > for representing "device isolation groups". That is, groups of > devices which can be "isolated" in such a way that the rest of the > system can be protected from them, even in the presence of userspace > or a guest OS directly driving the devices. > > Isolation will typically be due to an IOMMU which can safely remap DMA > and interrupts coming from these devices. We need to represent whole > groups, rather than individual devices, because there are a number of > cases where the group can be isolated as a whole, but devices within > it cannot be safely isolated from each other - this usually occurs > because the IOMMU cannot reliably distinguish which device in the > group initiated a transaction. In other words, isolation groups > represent the minimum safe granularity for passthrough to guests or > userspace. > > This series provides the core infraustrcture for tracking isolation > groups, and example implementations initializing the groups > appropriately for two PCI bridges (which include IOMMUs) found on IBM > POWER systems. > > Actually using the group information is not included here, but David > Woodhouse has expressed an interest in using a structure like this to > represent operations in iommu_ops more correctly. > > Some tracking of groups is a prerequisite for safe passthrough of > devices to guests or userspace, such as done by VFIO. Current VFIO > patches use the iommu_ops->device_group mechanism for this. However, > that mechanism is awkward, because without an in-kernel concrete > representation of groups, enumerating a group requires traversing > every device on a given bus type. It also fails to cover some very > plausible IOMMU topologies, because its groups cannot span devices on > multiple bus types. So far so good, but there's not much meat on the bone yet. The sysfs linking and a list of devices in a group is all pretty straight forward and obvious. I'm not sure yet how this solves the DMA quirks kind of issues though. For instance if we have the ricoh device that uses the wrong source ID for DMA from function 1 and we put functions 0 & 1 in an isolation group... then what? And who does device quirk grouping? Each IOMMU driver? For the iommu_device_group() interface, I had imagined that we'd have something like: struct device *device_dma_alias_quirk(struct device *dev) { if (<is broken ricoh func 1) return <ricoh func0>; return dev; } Then iommu_device_group turns into: int iommu_device_group(struct device *dev, unsigned int *groupid) { dev = device_dma_alias_quirk(dev); if (iommu_present(dev->bus) && dev->bus->iommu_ops->device_group) return dev->bus->iommu_ops->device_group(dev, groupid); return -ENODEV; } and device_dma_alias_quirk() is available for dma_ops too. So maybe a struct device_isolation_group not only needs a list of devices, but it also needs the representative device to do mappings identified. dma_ops would then just use dev->di_group->dma_dev for mappings, and I assume we call iommu_alloc() with a di_group and instead of iommu_attach/detach_device, we'd have iommu_attach/detach_group? What I'm really curious about is where you now stand on what's going to happen in device_isolation_bind(). How do we get from a device in sysfs pointing to a group to something like vfio binding to that group and creating a chardev to access it? Are we manipulating automatic driver binding or existing bound drivers once a group is bound? Do isolation groups enforce isolation, or just describe it? Thanks, Alex ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: RFC: Device isolation groups 2012-02-01 20:08 ` RFC: Device isolation groups Alex Williamson @ 2012-02-02 1:24 ` David Gibson 2012-02-29 19:30 ` Alex Williamson 0 siblings, 1 reply; 18+ messages in thread From: David Gibson @ 2012-02-02 1:24 UTC (permalink / raw) To: Alex Williamson Cc: dwmw2, iommu, aik, benh, qemu-devel, joerg.roedel, kvm, linux-kernel On Wed, Feb 01, 2012 at 01:08:39PM -0700, Alex Williamson wrote: > On Wed, 2012-02-01 at 15:46 +1100, David Gibson wrote: > > This patch series introduces a new infrastructure to the driver core > > for representing "device isolation groups". That is, groups of > > devices which can be "isolated" in such a way that the rest of the > > system can be protected from them, even in the presence of userspace > > or a guest OS directly driving the devices. > > > > Isolation will typically be due to an IOMMU which can safely remap DMA > > and interrupts coming from these devices. We need to represent whole > > groups, rather than individual devices, because there are a number of > > cases where the group can be isolated as a whole, but devices within > > it cannot be safely isolated from each other - this usually occurs > > because the IOMMU cannot reliably distinguish which device in the > > group initiated a transaction. In other words, isolation groups > > represent the minimum safe granularity for passthrough to guests or > > userspace. > > > > This series provides the core infraustrcture for tracking isolation > > groups, and example implementations initializing the groups > > appropriately for two PCI bridges (which include IOMMUs) found on IBM > > POWER systems. > > > > Actually using the group information is not included here, but David > > Woodhouse has expressed an interest in using a structure like this to > > represent operations in iommu_ops more correctly. > > > > Some tracking of groups is a prerequisite for safe passthrough of > > devices to guests or userspace, such as done by VFIO. Current VFIO > > patches use the iommu_ops->device_group mechanism for this. However, > > that mechanism is awkward, because without an in-kernel concrete > > representation of groups, enumerating a group requires traversing > > every device on a given bus type. It also fails to cover some very > > plausible IOMMU topologies, because its groups cannot span devices on > > multiple bus types. > > So far so good, but there's not much meat on the bone yet. True.. > The sysfs > linking and a list of devices in a group is all pretty straight forward > and obvious. I'm not sure yet how this solves the DMA quirks kind of > issues though. It doesn't, yet. > For instance if we have the ricoh device that uses the > wrong source ID for DMA from function 1 and we put functions 0 & 1 in an > isolation group... then what? And who does device quirk grouping? Each > IOMMU driver? I think so. It'd be nicer to have this in a common place, but I can't see a reasonable way of doing that - the IOMMU driver really needs to have control over group allocation. We can make it easy for the IOMMU drivers, though by having a common header quirk and flag which the IOMMU driver can check. The other thing I'm considering, is if we actually should take a whitelist approach, rather than a blacklist approach. Chances are that when you factor in various debug registers many, maybe most, multifunction devices won't actually safely isolate the functions from each other. So it might be a better approach to have IOMMU drivers generally treat a single slot as a group unless the specific device is whitelisted as having function isolation (and SR-IOV VFs would be whitelisted, of course). > For the iommu_device_group() interface, I had imagined that we'd have > something like: > > struct device *device_dma_alias_quirk(struct device *dev) > { > if (<is broken ricoh func 1) > return <ricoh func0>; > > return dev; > } > > Then iommu_device_group turns into: > > int iommu_device_group(struct device *dev, unsigned int *groupid) > { > dev = device_dma_alias_quirk(dev); > if (iommu_present(dev->bus) && dev->bus->iommu_ops->device_group) > return dev->bus->iommu_ops->device_group(dev, groupid); > > return -ENODEV; > } > > and device_dma_alias_quirk() is available for dma_ops too. > > So maybe a struct device_isolation_group not only needs a list of > devices, but it also needs the representative device to do mappings > identified. Perhaps. For now, I was assuming just taking the first list element would suffice for these cases. > dma_ops would then just use dev->di_group->dma_dev for > mappings, and I assume we call iommu_alloc() with a di_group and instead > of iommu_attach/detach_device, we'd have iommu_attach/detach_group? That's the idea, yes. > What I'm really curious about is where you now stand on what's going to > happen in device_isolation_bind(). How do we get from a device in sysfs > pointing to a group to something like vfio binding to that group and > creating a chardev to access it? Are we manipulating automatic driver > binding or existing bound drivers once a group is bound? Do isolation > groups enforce isolation, or just describe it? Thanks, So, I'm in the middle of rethinking this in terms of what states the group can be in. At minimum we'll need "host kernel" state, where host drivers bind normally to devices in the group and "opened by guest" state where the group is in active use by a guest or userspace. Clearly you can't leave opened by guest state when something has VFIO fds or equivalents open, and you can't enter opened by guest state when any host kernel drivers are bound to devices in the group. We're going to want some transitional states in between those, but what they should look like is what I'm rethinking. The previous drafts had just one state in between; "idle" where host drivers are excluded from the group but a guest isn't using it yet. Both you and David Woodhouse have expressed concern about the surprisingness of a bunch of host devices vanishing suddenly under that model. So that suggests a "leaving host" state where new driver bounds are prevented, but there might be existing bound drivers. It would move to "idle" state when the last host driver is unbound. Additionally I'm not sure if it makes sense to have a distinction between "assigned to guest" and "opened by guest" states. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: RFC: Device isolation groups 2012-02-02 1:24 ` David Gibson @ 2012-02-29 19:30 ` Alex Williamson 2012-03-09 3:40 ` David Gibson 0 siblings, 1 reply; 18+ messages in thread From: Alex Williamson @ 2012-02-29 19:30 UTC (permalink / raw) To: David Gibson Cc: dwmw2, iommu, aik, benh, qemu-devel, joerg.roedel, kvm, linux-kernel On Thu, 2012-02-02 at 12:24 +1100, David Gibson wrote: > On Wed, Feb 01, 2012 at 01:08:39PM -0700, Alex Williamson wrote: > > On Wed, 2012-02-01 at 15:46 +1100, David Gibson wrote: > > > This patch series introduces a new infrastructure to the driver core > > > for representing "device isolation groups". That is, groups of > > > devices which can be "isolated" in such a way that the rest of the > > > system can be protected from them, even in the presence of userspace > > > or a guest OS directly driving the devices. > > > > > > Isolation will typically be due to an IOMMU which can safely remap DMA > > > and interrupts coming from these devices. We need to represent whole > > > groups, rather than individual devices, because there are a number of > > > cases where the group can be isolated as a whole, but devices within > > > it cannot be safely isolated from each other - this usually occurs > > > because the IOMMU cannot reliably distinguish which device in the > > > group initiated a transaction. In other words, isolation groups > > > represent the minimum safe granularity for passthrough to guests or > > > userspace. > > > > > > This series provides the core infraustrcture for tracking isolation > > > groups, and example implementations initializing the groups > > > appropriately for two PCI bridges (which include IOMMUs) found on IBM > > > POWER systems. > > > > > > Actually using the group information is not included here, but David > > > Woodhouse has expressed an interest in using a structure like this to > > > represent operations in iommu_ops more correctly. > > > > > > Some tracking of groups is a prerequisite for safe passthrough of > > > devices to guests or userspace, such as done by VFIO. Current VFIO > > > patches use the iommu_ops->device_group mechanism for this. However, > > > that mechanism is awkward, because without an in-kernel concrete > > > representation of groups, enumerating a group requires traversing > > > every device on a given bus type. It also fails to cover some very > > > plausible IOMMU topologies, because its groups cannot span devices on > > > multiple bus types. > > > > So far so good, but there's not much meat on the bone yet. > > True.. Any update to this series? It would be great if we could map out the functionality to the point of breaking down and distributing work... or determining if the end result has any value add to what VFIO already does. Thanks, Alex ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: RFC: Device isolation groups 2012-02-29 19:30 ` Alex Williamson @ 2012-03-09 3:40 ` David Gibson 0 siblings, 0 replies; 18+ messages in thread From: David Gibson @ 2012-03-09 3:40 UTC (permalink / raw) To: Alex Williamson Cc: dwmw2, iommu, aik, benh, qemu-devel, joerg.roedel, kvm, linux-kernel On Wed, Feb 29, 2012 at 12:30:55PM -0700, Alex Williamson wrote: > On Thu, 2012-02-02 at 12:24 +1100, David Gibson wrote: > > On Wed, Feb 01, 2012 at 01:08:39PM -0700, Alex Williamson wrote: [snip] > Any update to this series? It would be great if we could map out the > functionality to the point of breaking down and distributing work... or > determining if the end result has any value add to what VFIO already > does. Thanks, Yes and no. No real change on the isolation code per se. I had been hoping for feedback from David Woodhouse, but I guess he's been too busy. In the meantime, however, Alexey has been working on a different approach to doing PCI passthrough which is more suitable for our machines. It is based on passing through an entire virtual host bridge (which could be a whole host side host bridge, or a subset, depending on host isolation capabilities), rather than individual devices. This makes it substantially simpler than VFIO (we don't need to virtualize config space or device addresses), and it provides better enforcement of isolation guarantees (VFIO isolation can be broken if devices have MMIO backdoors to config space, or if they can be made to DMA to other devices MMIO addresses instead of RAM addresses), but does require suitable bridge hardware - pSeries has such hardware, x86 mostly doesn't (although it wouldn't surprise me if large server class x86 machines do or will provide the necessary things). Even on this sort of hardware the device-centred VFIO approach may have uses, since it might allow finer grained division, at the cost of isolation enforcement. This provides a more concrete case for the isolation infrastructure, since it would allow the virtual-PHB and VFIO approaches to co-exist. As Alexey's prototype comes into shape, it should illuminate what other content we need in the isolation infrastructure to make it fully usable. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2012-03-09 3:40 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-02-01 4:46 RFC: Device isolation groups David Gibson 2012-02-01 4:46 ` [PATCH 1/3] Device isolation group infrastructure (v3) David Gibson 2012-02-08 15:27 ` Joerg Roedel 2012-02-08 21:39 ` Benjamin Herrenschmidt 2012-02-09 11:28 ` Joerg Roedel 2012-02-10 0:21 ` David Gibson 2012-02-08 21:43 ` Benjamin Herrenschmidt 2012-02-09 1:40 ` David Gibson 2012-02-01 4:46 ` [PATCH 2/3] device_isolation: Support isolation on POWER p5ioc2 bridges David Gibson 2012-02-01 18:58 ` Alex Williamson 2012-02-01 19:15 ` Alex Williamson 2012-02-01 4:46 ` [PATCH 3/3] device_isolation: Support isolation on POWER p7ioc (IODA) bridges David Gibson 2012-02-01 19:17 ` Alex Williamson 2012-02-02 0:23 ` David Gibson 2012-02-01 20:08 ` RFC: Device isolation groups Alex Williamson 2012-02-02 1:24 ` David Gibson 2012-02-29 19:30 ` Alex Williamson 2012-03-09 3:40 ` David Gibson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).