linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/7] IOMMU: Groups support
@ 2012-05-30 20:18 Alex Williamson
  2012-05-30 20:18 ` [PATCH v2 1/7] driver core: Add iommu_group tracking to struct device Alex Williamson
                   ` (8 more replies)
  0 siblings, 9 replies; 21+ messages in thread
From: Alex Williamson @ 2012-05-30 20:18 UTC (permalink / raw)
  To: joerg.roedel, dwmw2, iommu
  Cc: bhelgaas, benh, aik, david, konrad.wilk, linux-kernel, linux-pci,
	gregkh, ddutile, alex.williamson, liuj97

v2:
 - Trickle down changes from pci_get_dma_source() to better handle
   PCI device reference counting in IOMMU driver patches

This depends on v2 PCI patches:

http://marc.info/?l=linux-kernel&m=133840870510712

Thanks,

Alex

v1:
These are the IOMMU related patches from the v2 IOMMU Groups +
VFIO series.  I've separated out PCI and VFIO to try to make
progress on the foundation we need for VFIO.  Patches 5 & 6
of this series depend on patches 1 & 2 of the PCI series found
here:

http://marc.info/?l=linux-kernel&m=133835363021384

Patch 1 adds an iommu_group pointer to struct device so that we
have a place to link a device to a specific iommu group.  GregKH
has already acked this, so it should go in with the group support
itself.  Patch 2 is the core of the IOMMU group support.  Patches
3 & 4 allow groups to be created on AMD-Vi and Intel VT-d systems.
Patches 5 & 6 make use of the additional PCI DMA quirks and ACS
support checking to make groups more secure.  Patch 7 removes the
iommu=group_mf option as ACS checking results in most multifunction
devices being grouped already.

These patches, as well as the PCI support patches and VFIO can be
found in git here:

git://github.com/awilliam/linux-vfio.git (iommu-group-vfio-next-20120529)

Please consider these for 3.5, but I'll settle for any kind of next
branch.  Thanks,

Alex

---

Alex Williamson (7):
      iommu: Remove group_mf
      intel-iommu: Make use of DMA quirks and ACS checks in IOMMU groups
      amd_iommu: Make use of DMA quirks and ACS checks in IOMMU groups
      intel-iommu: Support IOMMU groups
      amd_iommu: Support IOMMU groups
      iommu: IOMMU Groups
      driver core: Add iommu_group tracking to struct device


 .../ABI/testing/sysfs-kernel-iommu_groups          |   14 
 Documentation/kernel-parameters.txt                |    1 
 arch/ia64/include/asm/iommu.h                      |    2 
 arch/ia64/kernel/pci-dma.c                         |    1 
 arch/x86/include/asm/iommu.h                       |    1 
 arch/x86/kernel/pci-dma.c                          |   11 
 drivers/iommu/amd_iommu.c                          |   70 ++
 drivers/iommu/intel-iommu.c                        |   89 ++-
 drivers/iommu/iommu.c                              |  578 +++++++++++++++++++-
 include/linux/device.h                             |    2 
 include/linux/iommu.h                              |  104 +++-
 11 files changed, 767 insertions(+), 106 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-kernel-iommu_groups

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 1/7] driver core: Add iommu_group tracking to struct device
  2012-05-30 20:18 [PATCH v2 0/7] IOMMU: Groups support Alex Williamson
@ 2012-05-30 20:18 ` Alex Williamson
  2012-06-20  5:45   ` Benjamin Herrenschmidt
  2012-05-30 20:18 ` [PATCH v2 2/7] iommu: IOMMU Groups Alex Williamson
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Alex Williamson @ 2012-05-30 20:18 UTC (permalink / raw)
  To: joerg.roedel, dwmw2, iommu
  Cc: bhelgaas, benh, aik, david, konrad.wilk, linux-kernel, linux-pci,
	gregkh, ddutile, alex.williamson, liuj97

IOMMU groups allow IOMMU drivers to represent DMA visibility
and isolation of devices.  Multiple devices may be grouped
together for the purposes of DMA.  Placing a pointer on
struct device enable easy access for things like streaming
DMA programming and drivers like VFIO.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---

 include/linux/device.h |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/device.h b/include/linux/device.h
index 161d962..d0e4d99 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -36,6 +36,7 @@ struct subsys_private;
 struct bus_type;
 struct device_node;
 struct iommu_ops;
+struct iommu_group;
 
 struct bus_attribute {
 	struct attribute	attr;
@@ -687,6 +688,7 @@ struct device {
 	const struct attribute_group **groups;	/* optional groups */
 
 	void	(*release)(struct device *dev);
+	struct iommu_group	*iommu_group;
 };
 
 /* Get the wakeup routines, which depend on struct device */


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 2/7] iommu: IOMMU Groups
  2012-05-30 20:18 [PATCH v2 0/7] IOMMU: Groups support Alex Williamson
  2012-05-30 20:18 ` [PATCH v2 1/7] driver core: Add iommu_group tracking to struct device Alex Williamson
@ 2012-05-30 20:18 ` Alex Williamson
  2012-06-20 10:01   ` Benjamin Herrenschmidt
  2012-05-30 20:19 ` [PATCH v2 3/7] amd_iommu: Support IOMMU groups Alex Williamson
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Alex Williamson @ 2012-05-30 20:18 UTC (permalink / raw)
  To: joerg.roedel, dwmw2, iommu
  Cc: bhelgaas, benh, aik, david, konrad.wilk, linux-kernel, linux-pci,
	gregkh, ddutile, alex.williamson, liuj97

IOMMU device groups are currently a rather vague associative notion
with assembly required by the user or user level driver provider to
do anything useful.  This patch intends to grow the IOMMU group concept
into something a bit more consumable.

To do this, we first create an object representing the group, struct
iommu_group.  This structure is allocated (iommu_group_alloc) and
filled (iommu_group_add_device) by the iommu driver.  The iommu driver
is free to add devices to the group using it's own set of policies.
This allows inclusion of devices based on physical hardware or topology
limitations of the platform, as well as soft requirements, such as
multi-function trust levels or peer-to-peer protection of the
interconnects.  Each device may only belong to a single iommu group,
which is linked from struct device.iommu_group.  IOMMU groups are
maintained using kobject reference counting, allowing for automatic
removal of empty, unreferenced groups.  It is the responsibility of
the iommu driver to remove devices from the group
(iommu_group_remove_device).

IOMMU groups also include a userspace representation in sysfs under
/sys/kernel/iommu_groups.  When allocated, each group is given a
dynamically assign ID (int).  The ID is managed by the core IOMMU group
code to support multiple heterogeneous iommu drivers, which could
potentially collide in group naming/numbering.  This also keeps group
IDs to small, easily managed values.  A directory is created under
/sys/kernel/iommu_groups for each group.  A further subdirectory named
"devices" contains links to each device within the group.  The iommu_group
file in the device's sysfs directory, which formerly contained a group
number when read, is now a link to the iommu group.  Example:

$ ls -l /sys/kernel/iommu_groups/26/devices/
total 0
lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 ->
		../../../../devices/pci0000:00/0000:00:1e.0
lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 ->
		../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0
lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 ->
		../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1

$ ls -l  /sys/kernel/iommu_groups/26/devices/*/iommu_group
[truncating perms/owner/timestamp]
/sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group ->
					../../../kernel/iommu_groups/26
/sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group ->
					../../../../kernel/iommu_groups/26
/sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group ->
					../../../../kernel/iommu_groups/26

Groups also include several exported functions for use by user level
driver providers, for example VFIO.  These include:

iommu_group_get(): Acquires a reference to a group from a device
iommu_group_put(): Releases reference
iommu_group_for_each_dev(): Iterates over group devices using callback
iommu_group_[un]register_notifier(): Allows notification of device add
        and remove operations relevant to the group
iommu_group_id(): Return the group number

This patch also extends the IOMMU API to allow attaching groups to
domains.  This is currently a simple wrapper for iterating through
devices within a group, but it's expected that the IOMMU API may
eventually make groups a more integral part of domains.

Groups intentionally do not try to manage group ownership.  A user
level driver provider must independently acquire ownership for each
device within a group before making use of the group as a whole.
This may change in the future if group usage becomes more pervasive
across both DMA and IOMMU ops.

Groups intentionally do not provide a mechanism for driver locking
or otherwise manipulating driver matching/probing of devices within
the group.  Such interfaces are generic to devices and beyond the
scope of IOMMU groups.  If implemented, user level providers have
ready access via iommu_group_for_each_dev and group notifiers.

iommu_device_group() is removed here as it has no users.  The
replacement is:

	group = iommu_group_get(dev);
	id = iommu_group_id(group);
	iommu_group_put(group);

AMD-Vi & Intel VT-d support re-added in following patches.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 .../ABI/testing/sysfs-kernel-iommu_groups          |   14 
 drivers/iommu/amd_iommu.c                          |   21 -
 drivers/iommu/intel-iommu.c                        |   49 --
 drivers/iommu/iommu.c                              |  578 +++++++++++++++++++-
 include/linux/iommu.h                              |  104 +++-
 5 files changed, 663 insertions(+), 103 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-kernel-iommu_groups

diff --git a/Documentation/ABI/testing/sysfs-kernel-iommu_groups b/Documentation/ABI/testing/sysfs-kernel-iommu_groups
new file mode 100644
index 0000000..9b31556
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-kernel-iommu_groups
@@ -0,0 +1,14 @@
+What:		/sys/kernel/iommu_groups/
+Date:		May 2012
+KernelVersion:	v3.5
+Contact:	Alex Williamson <alex.williamson@redhat.com>
+Description:	/sys/kernel/iommu_groups/ contains a number of sub-
+		directories, each representing an IOMMU group.  The
+		name of the sub-directory matches the iommu_group_id()
+		for the group, which is an integer value.  Within each
+		subdirectory is another directory named "devices" with
+		links to the sysfs devices contained in this group.
+		The group directory also optionally contains a "name"
+		file if the IOMMU driver has chosen to register a more
+		common name for the group.
+Users:
diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index d90a421..0ad46f1 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3210,26 +3210,6 @@ static int amd_iommu_domain_has_cap(struct iommu_domain *domain,
 	return 0;
 }
 
-static int amd_iommu_device_group(struct device *dev, unsigned int *groupid)
-{
-	struct iommu_dev_data *dev_data = dev->archdata.iommu;
-	struct pci_dev *pdev = to_pci_dev(dev);
-	u16 devid;
-
-	if (!dev_data)
-		return -ENODEV;
-
-	if (pdev->is_virtfn || !iommu_group_mf)
-		devid = dev_data->devid;
-	else
-		devid = calc_devid(pdev->bus->number,
-				   PCI_DEVFN(PCI_SLOT(pdev->devfn), 0));
-
-	*groupid = amd_iommu_alias_table[devid];
-
-	return 0;
-}
-
 static struct iommu_ops amd_iommu_ops = {
 	.domain_init = amd_iommu_domain_init,
 	.domain_destroy = amd_iommu_domain_destroy,
@@ -3239,7 +3219,6 @@ static struct iommu_ops amd_iommu_ops = {
 	.unmap = amd_iommu_unmap,
 	.iova_to_phys = amd_iommu_iova_to_phys,
 	.domain_has_cap = amd_iommu_domain_has_cap,
-	.device_group = amd_iommu_device_group,
 	.pgsize_bitmap	= AMD_IOMMU_PGSIZES,
 };
 
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index b12af2f..c62f2df 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4090,54 +4090,6 @@ static int intel_iommu_domain_has_cap(struct iommu_domain *domain,
 	return 0;
 }
 
-/*
- * Group numbers are arbitrary.  Device with the same group number
- * indicate the iommu cannot differentiate between them.  To avoid
- * tracking used groups we just use the seg|bus|devfn of the lowest
- * level we're able to differentiate devices
- */
-static int intel_iommu_device_group(struct device *dev, unsigned int *groupid)
-{
-	struct pci_dev *pdev = to_pci_dev(dev);
-	struct pci_dev *bridge;
-	union {
-		struct {
-			u8 devfn;
-			u8 bus;
-			u16 segment;
-		} pci;
-		u32 group;
-	} id;
-
-	if (iommu_no_mapping(dev))
-		return -ENODEV;
-
-	id.pci.segment = pci_domain_nr(pdev->bus);
-	id.pci.bus = pdev->bus->number;
-	id.pci.devfn = pdev->devfn;
-
-	if (!device_to_iommu(id.pci.segment, id.pci.bus, id.pci.devfn))
-		return -ENODEV;
-
-	bridge = pci_find_upstream_pcie_bridge(pdev);
-	if (bridge) {
-		if (pci_is_pcie(bridge)) {
-			id.pci.bus = bridge->subordinate->number;
-			id.pci.devfn = 0;
-		} else {
-			id.pci.bus = bridge->bus->number;
-			id.pci.devfn = bridge->devfn;
-		}
-	}
-
-	if (!pdev->is_virtfn && iommu_group_mf)
-		id.pci.devfn = PCI_DEVFN(PCI_SLOT(id.pci.devfn), 0);
-
-	*groupid = id.group;
-
-	return 0;
-}
-
 static struct iommu_ops intel_iommu_ops = {
 	.domain_init	= intel_iommu_domain_init,
 	.domain_destroy = intel_iommu_domain_destroy,
@@ -4147,7 +4099,6 @@ static struct iommu_ops intel_iommu_ops = {
 	.unmap		= intel_iommu_unmap,
 	.iova_to_phys	= intel_iommu_iova_to_phys,
 	.domain_has_cap = intel_iommu_domain_has_cap,
-	.device_group	= intel_iommu_device_group,
 	.pgsize_bitmap	= INTEL_IOMMU_PGSIZES,
 };
 
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 8b9ded8..0e928ac 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -26,60 +26,535 @@
 #include <linux/slab.h>
 #include <linux/errno.h>
 #include <linux/iommu.h>
+#include <linux/idr.h>
+#include <linux/notifier.h>
+#include <linux/err.h>
+
+static struct kset *iommu_group_kset;
+static struct ida iommu_group_ida;
+static struct mutex iommu_group_mutex;
+
+struct iommu_group {
+	struct kobject kobj;
+	struct kobject *devices_kobj;
+	struct list_head devices;
+	struct mutex mutex;
+	struct blocking_notifier_head notifier;
+	void *iommu_data;
+	void (*iommu_data_release)(void *iommu_data);
+	char *name;
+	int id;
+};
+
+struct iommu_device {
+	struct list_head list;
+	struct device *dev;
+	char *name;
+};
+
+struct iommu_group_attribute {
+	struct attribute attr;
+	ssize_t (*show)(struct iommu_group *group, char *buf);
+	ssize_t (*store)(struct iommu_group *group,
+			 const char *buf, size_t count);
+};
+
+#define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)		\
+struct iommu_group_attribute iommu_group_attr_##_name =		\
+	__ATTR(_name, _mode, _show, _store)
 
-static ssize_t show_iommu_group(struct device *dev,
-				struct device_attribute *attr, char *buf)
+#define to_iommu_group_attr(_attr)	\
+	container_of(_attr, struct iommu_group_attribute, attr)
+#define to_iommu_group(_kobj)		\
+	container_of(_kobj, struct iommu_group, kobj)
+
+static ssize_t iommu_group_attr_show(struct kobject *kobj,
+				     struct attribute *__attr, char *buf)
 {
-	unsigned int groupid;
+	struct iommu_group_attribute *attr = to_iommu_group_attr(__attr);
+	struct iommu_group *group = to_iommu_group(kobj);
+	ssize_t ret = -EIO;
 
-	if (iommu_device_group(dev, &groupid))
-		return 0;
+	if (attr->show)
+		ret = attr->show(group, buf);
+	return ret;
+}
+
+static ssize_t iommu_group_attr_store(struct kobject *kobj,
+				      struct attribute *__attr,
+				      const char *buf, size_t count)
+{
+	struct iommu_group_attribute *attr = to_iommu_group_attr(__attr);
+	struct iommu_group *group = to_iommu_group(kobj);
+	ssize_t ret = -EIO;
 
-	return sprintf(buf, "%u", groupid);
+	if (attr->store)
+		ret = attr->store(group, buf, count);
+	return ret;
 }
-static DEVICE_ATTR(iommu_group, S_IRUGO, show_iommu_group, NULL);
 
-static int add_iommu_group(struct device *dev, void *data)
+static const struct sysfs_ops iommu_group_sysfs_ops = {
+	.show = iommu_group_attr_show,
+	.store = iommu_group_attr_store,
+};
+
+static int iommu_group_create_file(struct iommu_group *group,
+				   struct iommu_group_attribute *attr)
+{
+	return sysfs_create_file(&group->kobj, &attr->attr);
+}
+
+static void iommu_group_remove_file(struct iommu_group *group,
+				    struct iommu_group_attribute *attr)
+{
+	sysfs_remove_file(&group->kobj, &attr->attr);
+}
+
+static ssize_t iommu_group_show_name(struct iommu_group *group, char *buf)
+{
+	return sprintf(buf, "%s\n", group->name);
+}
+
+static IOMMU_GROUP_ATTR(name, S_IRUGO, iommu_group_show_name, NULL);
+
+static void iommu_group_release(struct kobject *kobj)
+{
+	struct iommu_group *group = to_iommu_group(kobj);
+
+	if (group->iommu_data_release)
+		group->iommu_data_release(group->iommu_data);
+
+	mutex_lock(&iommu_group_mutex);
+	ida_remove(&iommu_group_ida, group->id);
+	mutex_unlock(&iommu_group_mutex);
+
+	kfree(group->name);
+	kfree(group);
+}
+
+static struct kobj_type iommu_group_ktype = {
+	.sysfs_ops = &iommu_group_sysfs_ops,
+	.release = iommu_group_release,
+};
+
+/**
+ * iommu_group_alloc - Allocate a new group
+ * @name: Optional name to associate with group, visible in sysfs
+ *
+ * This function is called by an iommu driver to allocate a new iommu
+ * group.  The iommu group represents the minimum granularity of the iommu.
+ * Upon successful return, the caller holds a reference to the supplied
+ * group in order to hold the group until devices are added.  Use
+ * iommu_group_put() to release this extra reference count, allowing the
+ * group to be automatically reclaimed once it has no devices or external
+ * references.
+ */
+struct iommu_group *iommu_group_alloc(void)
 {
-	unsigned int groupid;
+	struct iommu_group *group;
+	int ret;
+
+	group = kzalloc(sizeof(*group), GFP_KERNEL);
+	if (!group)
+		return ERR_PTR(-ENOMEM);
+
+	group->kobj.kset = iommu_group_kset;
+	mutex_init(&group->mutex);
+	INIT_LIST_HEAD(&group->devices);
+	BLOCKING_INIT_NOTIFIER_HEAD(&group->notifier);
+
+	mutex_lock(&iommu_group_mutex);
+
+again:
+	if (unlikely(0 == ida_pre_get(&iommu_group_ida, GFP_KERNEL))) {
+		kfree(group);
+		mutex_unlock(&iommu_group_mutex);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	if (-EAGAIN == ida_get_new(&iommu_group_ida, &group->id))
+		goto again;
+
+	mutex_unlock(&iommu_group_mutex);
 
-	if (iommu_device_group(dev, &groupid) == 0)
-		return device_create_file(dev, &dev_attr_iommu_group);
+	ret = kobject_init_and_add(&group->kobj, &iommu_group_ktype,
+				   NULL, "%d", group->id);
+	if (ret) {
+		mutex_lock(&iommu_group_mutex);
+		ida_remove(&iommu_group_ida, group->id);
+		mutex_unlock(&iommu_group_mutex);
+		kfree(group);
+		return ERR_PTR(ret);
+	}
+
+	group->devices_kobj = kobject_create_and_add("devices", &group->kobj);
+	if (!group->devices_kobj) {
+		kobject_put(&group->kobj); /* triggers .release & free */
+		return ERR_PTR(-ENOMEM);
+	}
+
+	/*
+	 * The devices_kobj holds a reference on the group kobject, so
+	 * as long as that exists so will the group.  We can therefore
+	 * use the devices_kobj for reference counting.
+	 */
+	kobject_put(&group->kobj);
+
+	return group;
+}
+EXPORT_SYMBOL_GPL(iommu_group_alloc);
+
+/**
+ * iommu_group_get_iommudata - retrieve iommu_data registered for a group
+ * @group: the group
+ *
+ * iommu drivers can store data in the group for use when doing iommu
+ * operations.  This function provides a way to retrieve it.  Caller
+ * should hold a group reference.
+ */
+void *iommu_group_get_iommudata(struct iommu_group *group)
+{
+	return group->iommu_data;
+}
+EXPORT_SYMBOL_GPL(iommu_group_get_iommudata);
+
+/**
+ * iommu_group_set_iommudata - set iommu_data for a group
+ * @group: the group
+ * @iommu_data: new data
+ * @release: release function for iommu_data
+ *
+ * iommu drivers can store data in the group for use when doing iommu
+ * operations.  This function provides a way to set the data after
+ * the group has been allocated.  Caller should hold a group reference.
+ */
+void iommu_group_set_iommudata(struct iommu_group *group, void *iommu_data,
+			       void (*release)(void *iommu_data))
+{
+	group->iommu_data = iommu_data;
+	group->iommu_data_release = release;
+}
+EXPORT_SYMBOL_GPL(iommu_group_set_iommudata);
+
+/**
+ * iommu_group_set_name - set name for a group
+ * @group: the group
+ * @name: name
+ *
+ * Allow iommu driver to set a name for a group.  When set it will
+ * appear in a name attribute file under the group in sysfs.
+ */
+int iommu_group_set_name(struct iommu_group *group, const char *name)
+{
+	int ret;
+
+	if (group->name) {
+		iommu_group_remove_file(group, &iommu_group_attr_name);
+		kfree(group->name);
+		group->name = NULL;
+		if (!name)
+			return 0;
+	}
+
+	group->name = kstrdup(name, GFP_KERNEL);
+	if (!group->name)
+		return -ENOMEM;
+
+	ret = iommu_group_create_file(group, &iommu_group_attr_name);
+	if (ret) {
+		kfree(group->name);
+		group->name = NULL;
+		return ret;
+	}
 
 	return 0;
 }
+EXPORT_SYMBOL_GPL(iommu_group_set_name);
 
-static int remove_iommu_group(struct device *dev)
+/**
+ * iommu_group_add_device - add a device to an iommu group
+ * @group: the group into which to add the device (reference should be held)
+ * @dev: the device
+ *
+ * This function is called by an iommu driver to add a device into a
+ * group.  Adding a device increments the group reference count.
+ */
+int iommu_group_add_device(struct iommu_group *group, struct device *dev)
 {
-	unsigned int groupid;
+	int ret, i = 0;
+	struct iommu_device *device;
+
+	device = kzalloc(sizeof(*device), GFP_KERNEL);
+	if (!device)
+		return -ENOMEM;
+
+	device->dev = dev;
 
-	if (iommu_device_group(dev, &groupid) == 0)
-		device_remove_file(dev, &dev_attr_iommu_group);
+	ret = sysfs_create_link(&dev->kobj, &group->kobj, "iommu_group");
+	if (ret) {
+		kfree(device);
+		return ret;
+	}
+
+	device->name = kasprintf(GFP_KERNEL, "%s", kobject_name(&dev->kobj));
+rename:
+	if (!device->name) {
+		sysfs_remove_link(&dev->kobj, "iommu_group");
+		kfree(device);
+		return -ENOMEM;
+	}
 
+	ret = sysfs_create_link_nowarn(group->devices_kobj,
+				       &dev->kobj, device->name);
+	if (ret) {
+		kfree(device->name);
+		if (ret == -EEXIST && i >= 0) {
+			/*
+			 * Account for the slim chance of collision
+			 * and append an instance to the name.
+			 */
+			device->name = kasprintf(GFP_KERNEL, "%s.%d",
+						 kobject_name(&dev->kobj), i++);
+			goto rename;
+		}
+
+		sysfs_remove_link(&dev->kobj, "iommu_group");
+		kfree(device);
+		return ret;
+	}
+
+	kobject_get(group->devices_kobj);
+
+	dev->iommu_group = group;
+
+	mutex_lock(&group->mutex);
+	list_add_tail(&device->list, &group->devices);
+	mutex_unlock(&group->mutex);
+
+	/* Notify any listeners about change to group. */
+	blocking_notifier_call_chain(&group->notifier,
+				     IOMMU_GROUP_NOTIFY_ADD_DEVICE, dev);
 	return 0;
 }
+EXPORT_SYMBOL_GPL(iommu_group_add_device);
 
-static int iommu_device_notifier(struct notifier_block *nb,
-				 unsigned long action, void *data)
+/**
+ * iommu_group_remove_device - remove a device from it's current group
+ * @dev: device to be removed
+ *
+ * This function is called by an iommu driver to remove the device from
+ * it's current group.  This decrements the iommu group reference count.
+ */
+void iommu_group_remove_device(struct device *dev)
+{
+	struct iommu_group *group = dev->iommu_group;
+	struct iommu_device *tmp_device, *device = NULL;
+
+	/* Pre-notify listeners that a device is being removed. */
+	blocking_notifier_call_chain(&group->notifier,
+				     IOMMU_GROUP_NOTIFY_DEL_DEVICE, dev);
+
+	mutex_lock(&group->mutex);
+	list_for_each_entry(tmp_device, &group->devices, list) {
+		if (tmp_device->dev == dev) {
+			device = tmp_device;
+			list_del(&device->list);
+			break;
+		}
+	}
+	mutex_unlock(&group->mutex);
+
+	if (!device)
+		return;
+
+	sysfs_remove_link(group->devices_kobj, device->name);
+	sysfs_remove_link(&dev->kobj, "iommu_group");
+
+	kfree(device->name);
+	kfree(device);
+	dev->iommu_group = NULL;
+	kobject_put(group->devices_kobj);
+}
+EXPORT_SYMBOL_GPL(iommu_group_remove_device);
+
+/**
+ * iommu_group_for_each_dev - iterate over each device in the group
+ * @group: the group
+ * @data: caller opaque data to be passed to callback function
+ * @fn: caller supplied callback function
+ *
+ * This function is called by group users to iterate over group devices.
+ * Callers should hold a reference count to the group during callback.
+ * The group->mutex is held across callbacks, which will block calls to
+ * iommu_group_add/remove_device.
+ */
+int iommu_group_for_each_dev(struct iommu_group *group, void *data,
+			     int (*fn)(struct device *, void *))
+{
+	struct iommu_device *device;
+	int ret = 0;
+
+	mutex_lock(&group->mutex);
+	list_for_each_entry(device, &group->devices, list) {
+		ret = fn(device->dev, data);
+		if (ret)
+			break;
+	}
+	mutex_unlock(&group->mutex);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_group_for_each_dev);
+
+/**
+ * iommu_group_get - Return the group for a device and increment reference
+ * @dev: get the group that this device belongs to
+ *
+ * This function is called by iommu drivers and users to get the group
+ * for the specified device.  If found, the group is returned and the group
+ * reference in incremented, else NULL.
+ */
+struct iommu_group *iommu_group_get(struct device *dev)
+{
+	struct iommu_group *group = dev->iommu_group;
+
+	if (group)
+		kobject_get(group->devices_kobj);
+
+	return group;
+}
+EXPORT_SYMBOL_GPL(iommu_group_get);
+
+/**
+ * iommu_group_put - Decrement group reference
+ * @group: the group to use
+ *
+ * This function is called by iommu drivers and users to release the
+ * iommu group.  Once the reference count is zero, the group is released.
+ */
+void iommu_group_put(struct iommu_group *group)
+{
+	if (group)
+		kobject_put(group->devices_kobj);
+}
+EXPORT_SYMBOL_GPL(iommu_group_put);
+
+/**
+ * iommu_group_register_notifier - Register a notifier for group changes
+ * @group: the group to watch
+ * @nb: notifier block to signal
+ *
+ * This function allows iommu group users to track changes in a group.
+ * See include/linux/iommu.h for actions sent via this notifier.  Caller
+ * should hold a reference to the group throughout notifier registration.
+ */
+int iommu_group_register_notifier(struct iommu_group *group,
+				  struct notifier_block *nb)
+{
+	return blocking_notifier_chain_register(&group->notifier, nb);
+}
+EXPORT_SYMBOL_GPL(iommu_group_register_notifier);
+
+/**
+ * iommu_group_unregister_notifier - Unregister a notifier
+ * @group: the group to watch
+ * @nb: notifier block to signal
+ *
+ * Unregister a previously registered group notifier block.
+ */
+int iommu_group_unregister_notifier(struct iommu_group *group,
+				    struct notifier_block *nb)
+{
+	return blocking_notifier_chain_unregister(&group->notifier, nb);
+}
+EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
+
+/**
+ * iommu_group_id - Return ID for a group
+ * @group: the group to ID
+ *
+ * Return the unique ID for the group matching the sysfs group number.
+ */
+int iommu_group_id(struct iommu_group *group)
+{
+	return group->id;
+}
+EXPORT_SYMBOL_GPL(iommu_group_id);
+
+static int add_iommu_group(struct device *dev, void *data)
+{
+	struct iommu_ops *ops = data;
+
+	if (!ops->add_device)
+		return -ENODEV;
+
+	WARN_ON(dev->iommu_group);
+
+	ops->add_device(dev);
+
+	return 0;
+}
+
+static int iommu_bus_notifier(struct notifier_block *nb,
+			      unsigned long action, void *data)
 {
 	struct device *dev = data;
+	struct iommu_ops *ops = dev->bus->iommu_ops;
+	struct iommu_group *group;
+	unsigned long group_action = 0;
+
+	/*
+	 * ADD/DEL call into iommu driver ops if provided, which may
+	 * result in ADD/DEL notifiers to group->notifier
+	 */
+	if (action == BUS_NOTIFY_ADD_DEVICE) {
+		if (ops->add_device)
+			return ops->add_device(dev);
+	} else if (action == BUS_NOTIFY_DEL_DEVICE) {
+		if (ops->remove_device && dev->iommu_group) {
+			ops->remove_device(dev);
+			return 0;
+		}
+	}
 
-	if (action == BUS_NOTIFY_ADD_DEVICE)
-		return add_iommu_group(dev, NULL);
-	else if (action == BUS_NOTIFY_DEL_DEVICE)
-		return remove_iommu_group(dev);
+	/*
+	 * Remaining BUS_NOTIFYs get filtered and republished to the
+	 * group, if anyone is listening
+	 */
+	group = iommu_group_get(dev);
+	if (!group)
+		return 0;
 
+	switch (action) {
+	case BUS_NOTIFY_BIND_DRIVER:
+		group_action = IOMMU_GROUP_NOTIFY_BIND_DRIVER;
+		break;
+	case BUS_NOTIFY_BOUND_DRIVER:
+		group_action = IOMMU_GROUP_NOTIFY_BOUND_DRIVER;
+		break;
+	case BUS_NOTIFY_UNBIND_DRIVER:
+		group_action = IOMMU_GROUP_NOTIFY_UNBIND_DRIVER;
+		break;
+	case BUS_NOTIFY_UNBOUND_DRIVER:
+		group_action = IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER;
+		break;
+	}
+
+	if (group_action)
+		blocking_notifier_call_chain(&group->notifier,
+					     group_action, dev);
+
+	iommu_group_put(group);
 	return 0;
 }
 
-static struct notifier_block iommu_device_nb = {
-	.notifier_call = iommu_device_notifier,
+static struct notifier_block iommu_bus_nb = {
+	.notifier_call = iommu_bus_notifier,
 };
 
 static void iommu_bus_init(struct bus_type *bus, struct iommu_ops *ops)
 {
-	bus_register_notifier(bus, &iommu_device_nb);
-	bus_for_each_dev(bus, NULL, NULL, add_iommu_group);
+	bus_register_notifier(bus, &iommu_bus_nb);
+	bus_for_each_dev(bus, NULL, ops, add_iommu_group);
 }
 
 /**
@@ -192,6 +667,45 @@ void iommu_detach_device(struct iommu_domain *domain, struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_detach_device);
 
+/*
+ * IOMMU groups are really the natrual working unit of the IOMMU, but
+ * the IOMMU API works on domains and devices.  Bridge that gap by
+ * iterating over the devices in a group.  Ideally we'd have a single
+ * device which represents the requestor ID of the group, but we also
+ * allow IOMMU drivers to create policy defined minimum sets, where
+ * the physical hardware may be able to distiguish members, but we
+ * wish to group them at a higher level (ex. untrusted multi-function
+ * PCI devices).  Thus we attach each device.
+ */
+static int iommu_group_do_attach_device(struct device *dev, void *data)
+{
+	struct iommu_domain *domain = data;
+
+	return iommu_attach_device(domain, dev);
+}
+
+int iommu_attach_group(struct iommu_domain *domain, struct iommu_group *group)
+{
+	return iommu_group_for_each_dev(group, domain,
+					iommu_group_do_attach_device);
+}
+EXPORT_SYMBOL_GPL(iommu_attach_group);
+
+static int iommu_group_do_detach_device(struct device *dev, void *data)
+{
+	struct iommu_domain *domain = data;
+
+	iommu_detach_device(domain, dev);
+
+	return 0;
+}
+
+void iommu_detach_group(struct iommu_domain *domain, struct iommu_group *group)
+{
+	iommu_group_for_each_dev(group, domain, iommu_group_do_detach_device);
+}
+EXPORT_SYMBOL_GPL(iommu_detach_group);
+
 phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
 			       unsigned long iova)
 {
@@ -336,11 +850,15 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
 }
 EXPORT_SYMBOL_GPL(iommu_unmap);
 
-int iommu_device_group(struct device *dev, unsigned int *groupid)
+static int __init iommu_init(void)
 {
-	if (iommu_present(dev->bus) && dev->bus->iommu_ops->device_group)
-		return dev->bus->iommu_ops->device_group(dev, groupid);
+	iommu_group_kset = kset_create_and_add("iommu_groups",
+					       NULL, kernel_kobj);
+	ida_init(&iommu_group_ida);
+	mutex_init(&iommu_group_mutex);
 
-	return -ENODEV;
+	BUG_ON(!iommu_group_kset);
+
+	return 0;
 }
-EXPORT_SYMBOL_GPL(iommu_device_group);
+subsys_initcall(iommu_init);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 450293f..a71df92 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -26,6 +26,7 @@
 #define IOMMU_CACHE	(4) /* DMA cache coherency */
 
 struct iommu_ops;
+struct iommu_group;
 struct bus_type;
 struct device;
 struct iommu_domain;
@@ -60,6 +61,8 @@ struct iommu_domain {
  * @iova_to_phys: translate iova to physical address
  * @domain_has_cap: domain capabilities query
  * @commit: commit iommu domain
+ * @add_device: add device to iommu grouping
+ * @remove_device: remove device from iommu grouping
  * @pgsize_bitmap: bitmap of supported page sizes
  */
 struct iommu_ops {
@@ -75,10 +78,18 @@ struct iommu_ops {
 				    unsigned long iova);
 	int (*domain_has_cap)(struct iommu_domain *domain,
 			      unsigned long cap);
-	int (*device_group)(struct device *dev, unsigned int *groupid);
+	int (*add_device)(struct device *dev);
+	void (*remove_device)(struct device *dev);
 	unsigned long pgsize_bitmap;
 };
 
+#define IOMMU_GROUP_NOTIFY_ADD_DEVICE		1 /* Device added */
+#define IOMMU_GROUP_NOTIFY_DEL_DEVICE		2 /* Pre Device removed */
+#define IOMMU_GROUP_NOTIFY_BIND_DRIVER		3 /* Pre Driver bind */
+#define IOMMU_GROUP_NOTIFY_BOUND_DRIVER		4 /* Post Driver bind */
+#define IOMMU_GROUP_NOTIFY_UNBIND_DRIVER	5 /* Pre Driver unbind */
+#define IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER	6 /* Post Driver unbind */
+
 extern int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops);
 extern bool iommu_present(struct bus_type *bus);
 extern struct iommu_domain *iommu_domain_alloc(struct bus_type *bus);
@@ -97,7 +108,29 @@ extern int iommu_domain_has_cap(struct iommu_domain *domain,
 				unsigned long cap);
 extern void iommu_set_fault_handler(struct iommu_domain *domain,
 			iommu_fault_handler_t handler, void *token);
-extern int iommu_device_group(struct device *dev, unsigned int *groupid);
+
+extern int iommu_attach_group(struct iommu_domain *domain,
+			      struct iommu_group *group);
+extern void iommu_detach_group(struct iommu_domain *domain,
+			       struct iommu_group *group);
+extern struct iommu_group *iommu_group_alloc(void);
+extern void *iommu_group_get_iommudata(struct iommu_group *group);
+extern void iommu_group_set_iommudata(struct iommu_group *group,
+				      void *iommu_data,
+				      void (*release)(void *iommu_data));
+extern int iommu_group_set_name(struct iommu_group *group, const char *name);
+extern int iommu_group_add_device(struct iommu_group *group,
+				  struct device *dev);
+extern void iommu_group_remove_device(struct device *dev);
+extern int iommu_group_for_each_dev(struct iommu_group *group, void *data,
+				    int (*fn)(struct device *, void *));
+extern struct iommu_group *iommu_group_get(struct device *dev);
+extern void iommu_group_put(struct iommu_group *group);
+extern int iommu_group_register_notifier(struct iommu_group *group,
+					 struct notifier_block *nb);
+extern int iommu_group_unregister_notifier(struct iommu_group *group,
+					   struct notifier_block *nb);
+extern int iommu_group_id(struct iommu_group *group);
 
 /**
  * report_iommu_fault() - report about an IOMMU fault to the IOMMU framework
@@ -142,6 +175,7 @@ static inline int report_iommu_fault(struct iommu_domain *domain,
 #else /* CONFIG_IOMMU_API */
 
 struct iommu_ops {};
+struct iommu_group {};
 
 static inline bool iommu_present(struct bus_type *bus)
 {
@@ -197,11 +231,75 @@ static inline void iommu_set_fault_handler(struct iommu_domain *domain,
 {
 }
 
-static inline int iommu_device_group(struct device *dev, unsigned int *groupid)
+int iommu_attach_group(struct iommu_domain *domain, struct iommu_group *group)
+{
+	return -ENODEV;
+}
+
+void iommu_detach_group(struct iommu_domain *domain, struct iommu_group *group)
+{
+}
+
+struct iommu_group *iommu_group_alloc(void)
+{
+	return ERR_PTR(-ENODEV);
+}
+
+void *iommu_group_get_iommudata(struct iommu_group *group)
+{
+	return NULL;
+}
+
+void iommu_group_set_iommudata(struct iommu_group *group, void *iommu_data,
+			       void (*release)(void *iommu_data))
+{
+}
+
+int iommu_group_set_name(struct iommu_group *group, const char *name)
+{
+	return -ENODEV;
+}
+
+int iommu_group_add_device(struct iommu_group *group, struct device *dev)
+{
+	return -ENODEV;
+}
+
+void iommu_group_remove_device(struct device *dev)
+{
+}
+
+int iommu_group_for_each_dev(struct iommu_group *group, void *data,
+			     int (*fn)(struct device *, void *))
+{
+	return -ENODEV;
+}
+
+struct iommu_group *iommu_group_get(struct device *dev)
+{
+	return NULL;
+}
+
+void iommu_group_put(struct iommu_group *group)
+{
+}
+
+int iommu_group_register_notifier(struct iommu_group *group,
+				  struct notifier_block *nb)
 {
 	return -ENODEV;
 }
 
+int iommu_group_unregister_notifier(struct iommu_group *group,
+				    struct notifier_block *nb)
+{
+	return 0;
+}
+
+int iommu_group_id(struct iommu_group *group)
+{
+	return -ENODEV;
+}
 #endif /* CONFIG_IOMMU_API */
 
 #endif /* __LINUX_IOMMU_H */


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 3/7] amd_iommu: Support IOMMU groups
  2012-05-30 20:18 [PATCH v2 0/7] IOMMU: Groups support Alex Williamson
  2012-05-30 20:18 ` [PATCH v2 1/7] driver core: Add iommu_group tracking to struct device Alex Williamson
  2012-05-30 20:18 ` [PATCH v2 2/7] iommu: IOMMU Groups Alex Williamson
@ 2012-05-30 20:19 ` Alex Williamson
  2012-05-30 20:19 ` [PATCH v2 4/7] intel-iommu: " Alex Williamson
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Alex Williamson @ 2012-05-30 20:19 UTC (permalink / raw)
  To: joerg.roedel, dwmw2, iommu
  Cc: bhelgaas, benh, aik, david, konrad.wilk, linux-kernel, linux-pci,
	gregkh, ddutile, alex.williamson, liuj97

Add IOMMU group support to AMD-Vi device init and uninit code.
Existing notifiers make sure this gets called for each device.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 drivers/iommu/amd_iommu.c |   24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 0ad46f1..e072bd1 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -256,9 +256,11 @@ static bool check_device(struct device *dev)
 
 static int iommu_init_device(struct device *dev)
 {
-	struct pci_dev *pdev = to_pci_dev(dev);
+	struct pci_dev *dma_pdev, *pdev = to_pci_dev(dev);
 	struct iommu_dev_data *dev_data;
+	struct iommu_group *group;
 	u16 alias;
+	int ret;
 
 	if (dev->archdata.iommu)
 		return 0;
@@ -279,8 +281,26 @@ static int iommu_init_device(struct device *dev)
 			return -ENOTSUPP;
 		}
 		dev_data->alias_data = alias_data;
+
+		dma_pdev = pci_get_bus_and_slot(alias >> 8, alias & 0xff);
+	} else
+		dma_pdev = pci_dev_get(pdev);
+
+	group = iommu_group_get(&dma_pdev->dev);
+	pci_dev_put(dma_pdev);
+	if (!group) {
+		group = iommu_group_alloc();
+		if (IS_ERR(group))
+			return PTR_ERR(group);
 	}
 
+	ret = iommu_group_add_device(group, dev);
+
+	iommu_group_put(group);
+
+	if (ret)
+		return ret;
+
 	if (pci_iommuv2_capable(pdev)) {
 		struct amd_iommu *iommu;
 
@@ -309,6 +329,8 @@ static void iommu_ignore_device(struct device *dev)
 
 static void iommu_uninit_device(struct device *dev)
 {
+	iommu_group_remove_device(dev);
+
 	/*
 	 * Nothing to do here - we keep dev_data around for unplugged devices
 	 * and reuse it when the device is re-plugged - not doing so would


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 4/7] intel-iommu: Support IOMMU groups
  2012-05-30 20:18 [PATCH v2 0/7] IOMMU: Groups support Alex Williamson
                   ` (2 preceding siblings ...)
  2012-05-30 20:19 ` [PATCH v2 3/7] amd_iommu: Support IOMMU groups Alex Williamson
@ 2012-05-30 20:19 ` Alex Williamson
  2012-05-30 20:19 ` [PATCH v2 5/7] amd_iommu: Make use of DMA quirks and ACS checks in " Alex Williamson
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Alex Williamson @ 2012-05-30 20:19 UTC (permalink / raw)
  To: joerg.roedel, dwmw2, iommu
  Cc: bhelgaas, benh, aik, david, konrad.wilk, linux-kernel, linux-pci,
	gregkh, ddutile, alex.williamson, liuj97

Add IOMMU group support to Intel VT-d code.  This driver sets up
devices ondemand, so make use of the add_device/remove_device
callbacks in IOMMU API to manage setting up the groups.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 drivers/iommu/intel-iommu.c |   43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index c62f2df..4a43452 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4090,6 +4090,47 @@ static int intel_iommu_domain_has_cap(struct iommu_domain *domain,
 	return 0;
 }
 
+static int intel_iommu_add_device(struct device *dev)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct pci_dev *bridge, *dma_pdev;
+	struct iommu_group *group;
+	int ret;
+
+	if (!device_to_iommu(pci_domain_nr(pdev->bus),
+			     pdev->bus->number, pdev->devfn))
+		return -ENODEV;
+
+	bridge = pci_find_upstream_pcie_bridge(pdev);
+	if (bridge) {
+		if (pci_is_pcie(bridge))
+			dma_pdev = pci_get_domain_bus_and_slot(
+						pci_domain_nr(pdev->bus),
+						bridge->subordinate->number, 0);
+		else
+			dma_pdev = pci_dev_get(bridge);
+	} else
+		dma_pdev = pci_dev_get(pdev);
+
+	group = iommu_group_get(&dma_pdev->dev);
+	pci_dev_put(dma_pdev);
+	if (!group) {
+		group = iommu_group_alloc();
+		if (IS_ERR(group))
+			return PTR_ERR(group);
+	}
+
+	ret = iommu_group_add_device(group, dev);
+
+	iommu_group_put(group);
+	return ret;
+}
+
+static void intel_iommu_remove_device(struct device *dev)
+{
+	iommu_group_remove_device(dev);
+}
+
 static struct iommu_ops intel_iommu_ops = {
 	.domain_init	= intel_iommu_domain_init,
 	.domain_destroy = intel_iommu_domain_destroy,
@@ -4099,6 +4140,8 @@ static struct iommu_ops intel_iommu_ops = {
 	.unmap		= intel_iommu_unmap,
 	.iova_to_phys	= intel_iommu_iova_to_phys,
 	.domain_has_cap = intel_iommu_domain_has_cap,
+	.add_device	= intel_iommu_add_device,
+	.remove_device	= intel_iommu_remove_device,
 	.pgsize_bitmap	= INTEL_IOMMU_PGSIZES,
 };
 


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 5/7] amd_iommu: Make use of DMA quirks and ACS checks in IOMMU groups
  2012-05-30 20:18 [PATCH v2 0/7] IOMMU: Groups support Alex Williamson
                   ` (3 preceding siblings ...)
  2012-05-30 20:19 ` [PATCH v2 4/7] intel-iommu: " Alex Williamson
@ 2012-05-30 20:19 ` Alex Williamson
  2012-05-30 20:19 ` [PATCH v2 6/7] intel-iommu: " Alex Williamson
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Alex Williamson @ 2012-05-30 20:19 UTC (permalink / raw)
  To: joerg.roedel, dwmw2, iommu
  Cc: bhelgaas, benh, aik, david, konrad.wilk, linux-kernel, linux-pci,
	gregkh, ddutile, alex.williamson, liuj97

Work around broken devices and adhere to ACS support when determining
IOMMU grouping.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 drivers/iommu/amd_iommu.c |   25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index e072bd1..0e78d3c 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -254,6 +254,14 @@ static bool check_device(struct device *dev)
 	return true;
 }
 
+static void swap_pci_ref(struct pci_dev **from, struct pci_dev *to)
+{
+	pci_dev_put(*from);
+	*from = to;
+}
+
+#define REQ_ACS_FLAGS	(PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF)
+
 static int iommu_init_device(struct device *dev)
 {
 	struct pci_dev *dma_pdev, *pdev = to_pci_dev(dev);
@@ -286,6 +294,23 @@ static int iommu_init_device(struct device *dev)
 	} else
 		dma_pdev = pci_dev_get(pdev);
 
+	swap_pci_ref(&dma_pdev, pci_get_dma_source(dma_pdev));
+
+	if (dma_pdev->multifunction &&
+	    !pci_acs_enabled(dma_pdev, REQ_ACS_FLAGS))
+		swap_pci_ref(&dma_pdev,
+			     pci_get_slot(dma_pdev->bus,
+					  PCI_DEVFN(PCI_SLOT(dma_pdev->devfn),
+					  0)));
+
+	while (!pci_is_root_bus(dma_pdev->bus)) {
+		if (pci_acs_path_enabled(dma_pdev->bus->self,
+					 NULL, REQ_ACS_FLAGS))
+			break;
+
+		swap_pci_ref(&dma_pdev, pci_dev_get(dma_pdev->bus->self));
+	}
+
 	group = iommu_group_get(&dma_pdev->dev);
 	pci_dev_put(dma_pdev);
 	if (!group) {


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 6/7] intel-iommu: Make use of DMA quirks and ACS checks in IOMMU groups
  2012-05-30 20:18 [PATCH v2 0/7] IOMMU: Groups support Alex Williamson
                   ` (4 preceding siblings ...)
  2012-05-30 20:19 ` [PATCH v2 5/7] amd_iommu: Make use of DMA quirks and ACS checks in " Alex Williamson
@ 2012-05-30 20:19 ` Alex Williamson
  2012-05-31 19:47   ` Don Dutile
  2012-05-30 20:19 ` [PATCH v2 7/7] iommu: Remove group_mf Alex Williamson
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Alex Williamson @ 2012-05-30 20:19 UTC (permalink / raw)
  To: joerg.roedel, dwmw2, iommu
  Cc: bhelgaas, benh, aik, david, konrad.wilk, linux-kernel, linux-pci,
	gregkh, ddutile, alex.williamson, liuj97

Work around broken devices and adhere to ACS support when determining
IOMMU grouping.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 drivers/iommu/intel-iommu.c |   25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 4a43452..ebf2b31 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4090,6 +4090,14 @@ static int intel_iommu_domain_has_cap(struct iommu_domain *domain,
 	return 0;
 }
 
+static void swap_pci_ref(struct pci_dev **from, struct pci_dev *to)
+{
+	pci_dev_put(*from);
+	*from = to;
+}
+
+#define REQ_ACS_FLAGS	(PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF)
+
 static int intel_iommu_add_device(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
@@ -4112,6 +4120,23 @@ static int intel_iommu_add_device(struct device *dev)
 	} else
 		dma_pdev = pci_dev_get(pdev);
 
+	swap_pci_ref(&dma_pdev, pci_get_dma_source(dma_pdev));
+
+	if (dma_pdev->multifunction &&
+	    !pci_acs_enabled(dma_pdev, REQ_ACS_FLAGS))
+		swap_pci_ref(&dma_pdev,
+			     pci_get_slot(dma_pdev->bus,
+					  PCI_DEVFN(PCI_SLOT(dma_pdev->devfn),
+					  0)));
+
+	while (!pci_is_root_bus(dma_pdev->bus)) {
+		if (pci_acs_path_enabled(dma_pdev->bus->self,
+					 NULL, REQ_ACS_FLAGS))
+			break;
+
+		swap_pci_ref(&dma_pdev, pci_dev_get(dma_pdev->bus->self));
+	}
+
 	group = iommu_group_get(&dma_pdev->dev);
 	pci_dev_put(dma_pdev);
 	if (!group) {


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 7/7] iommu: Remove group_mf
  2012-05-30 20:18 [PATCH v2 0/7] IOMMU: Groups support Alex Williamson
                   ` (5 preceding siblings ...)
  2012-05-30 20:19 ` [PATCH v2 6/7] intel-iommu: " Alex Williamson
@ 2012-05-30 20:19 ` Alex Williamson
  2012-06-06 11:05 ` [PATCH v2 0/7] IOMMU: Groups support Joerg Roedel
  2012-06-25 11:29 ` Joerg Roedel
  8 siblings, 0 replies; 21+ messages in thread
From: Alex Williamson @ 2012-05-30 20:19 UTC (permalink / raw)
  To: joerg.roedel, dwmw2, iommu
  Cc: bhelgaas, benh, aik, david, konrad.wilk, linux-kernel, linux-pci,
	gregkh, ddutile, alex.williamson, liuj97

The iommu=group_mf is really no longer needed with the addition of ACS
support in IOMMU drivers creating groups.  Most multifunction devices
will now be grouped already.  If a device has gone to the trouble of
exposing ACS, trust that it works.  We can use the device specific ACS
function for fixing devices we trust individually.  This largely
reverts bcb71abe.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 Documentation/kernel-parameters.txt |    1 -
 arch/ia64/include/asm/iommu.h       |    2 --
 arch/ia64/kernel/pci-dma.c          |    1 -
 arch/x86/include/asm/iommu.h        |    1 -
 arch/x86/kernel/pci-dma.c           |   11 -----------
 5 files changed, 16 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 0ea6d04..1e38ef3 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1128,7 +1128,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 		forcesac
 		soft
 		pt		[x86, IA-64]
-		group_mf	[x86, IA-64]
 
 
 	io7=		[HW] IO7 for Marvel based alpha systems
diff --git a/arch/ia64/include/asm/iommu.h b/arch/ia64/include/asm/iommu.h
index b6a809f..105c93b 100644
--- a/arch/ia64/include/asm/iommu.h
+++ b/arch/ia64/include/asm/iommu.h
@@ -11,12 +11,10 @@ extern void no_iommu_init(void);
 extern int force_iommu, no_iommu;
 extern int iommu_pass_through;
 extern int iommu_detected;
-extern int iommu_group_mf;
 #else
 #define iommu_pass_through	(0)
 #define no_iommu		(1)
 #define iommu_detected		(0)
-#define iommu_group_mf		(0)
 #endif
 extern void iommu_dma_init(void);
 extern void machvec_init(const char *name);
diff --git a/arch/ia64/kernel/pci-dma.c b/arch/ia64/kernel/pci-dma.c
index 7cdc89b..1ddcfe5 100644
--- a/arch/ia64/kernel/pci-dma.c
+++ b/arch/ia64/kernel/pci-dma.c
@@ -32,7 +32,6 @@ int force_iommu __read_mostly;
 #endif
 
 int iommu_pass_through;
-int iommu_group_mf;
 
 /* Dummy device used for NULL arguments (normally ISA). Better would
    be probably a smaller DMA mask, but this is bug-to-bug compatible
diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
index dffc38e..345c99c 100644
--- a/arch/x86/include/asm/iommu.h
+++ b/arch/x86/include/asm/iommu.h
@@ -5,7 +5,6 @@ extern struct dma_map_ops nommu_dma_ops;
 extern int force_iommu, no_iommu;
 extern int iommu_detected;
 extern int iommu_pass_through;
-extern int iommu_group_mf;
 
 /* 10 seconds */
 #define DMAR_OPERATION_TIMEOUT ((cycles_t) tsc_khz*10*1000)
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 62c9457..ac8f823 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -45,15 +45,6 @@ int iommu_detected __read_mostly = 0;
  */
 int iommu_pass_through __read_mostly;
 
-/*
- * Group multi-function PCI devices into a single device-group for the
- * iommu_device_group interface.  This tells the iommu driver to pretend
- * it cannot distinguish between functions of a device, exposing only one
- * group for the device.  Useful for disallowing use of individual PCI
- * functions from userspace drivers.
- */
-int iommu_group_mf __read_mostly;
-
 extern struct iommu_table_entry __iommu_table[], __iommu_table_end[];
 
 /* Dummy device used for NULL arguments (normally ISA). */
@@ -193,8 +184,6 @@ static __init int iommu_setup(char *p)
 #endif
 		if (!strncmp(p, "pt", 2))
 			iommu_pass_through = 1;
-		if (!strncmp(p, "group_mf", 8))
-			iommu_group_mf = 1;
 
 		gart_parse_options(p);
 


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 6/7] intel-iommu: Make use of DMA quirks and ACS checks in IOMMU groups
  2012-05-30 20:19 ` [PATCH v2 6/7] intel-iommu: " Alex Williamson
@ 2012-05-31 19:47   ` Don Dutile
  2012-05-31 20:41     ` Alex Williamson
  0 siblings, 1 reply; 21+ messages in thread
From: Don Dutile @ 2012-05-31 19:47 UTC (permalink / raw)
  To: Alex Williamson
  Cc: joerg.roedel, dwmw2, iommu, bhelgaas, benh, aik, david,
	konrad.wilk, linux-kernel, linux-pci, gregkh, liuj97

On 05/30/2012 04:19 PM, Alex Williamson wrote:
> Work around broken devices and adhere to ACS support when determining
> IOMMU grouping.
>
> Signed-off-by: Alex Williamson<alex.williamson@redhat.com>
> ---
>
>   drivers/iommu/intel-iommu.c |   25 +++++++++++++++++++++++++
>   1 file changed, 25 insertions(+)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 4a43452..ebf2b31 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -4090,6 +4090,14 @@ static int intel_iommu_domain_has_cap(struct iommu_domain *domain,
>   	return 0;
>   }
>
> +static void swap_pci_ref(struct pci_dev **from, struct pci_dev *to)
> +{
> +	pci_dev_put(*from);
> +	*from = to;
> +}
> +
> +#define REQ_ACS_FLAGS	(PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF)
> +
>   static int intel_iommu_add_device(struct device *dev)
>   {
>   	struct pci_dev *pdev = to_pci_dev(dev);
> @@ -4112,6 +4120,23 @@ static int intel_iommu_add_device(struct device *dev)
>   	} else
>   		dma_pdev = pci_dev_get(pdev);
>
> +	swap_pci_ref(&dma_pdev, pci_get_dma_source(dma_pdev));
> +
> +	if (dma_pdev->multifunction&&
> +	    !pci_acs_enabled(dma_pdev, REQ_ACS_FLAGS))
> +		swap_pci_ref(&dma_pdev,
> +			     pci_get_slot(dma_pdev->bus,
> +					  PCI_DEVFN(PCI_SLOT(dma_pdev->devfn),
> +					  0)));
> +
> +	while (!pci_is_root_bus(dma_pdev->bus)) {
> +		if (pci_acs_path_enabled(dma_pdev->bus->self,
> +					 NULL, REQ_ACS_FLAGS))
> +			break;
> +
> +		swap_pci_ref(&dma_pdev, pci_dev_get(dma_pdev->bus->self));
> +	}
> +
I'm having deja-vu on this patch....
.... why not just make the above two patches as two functions in drivers/iommu/iommu.c,
one exported for these two modules (and maybe others someday...), e.g., iommu_pdev_put())
which [intel-,amd-]iommu.c call ?

  
>   	group = iommu_group_get(&dma_pdev->dev);
>   	pci_dev_put(dma_pdev);
>   	if (!group) {
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 6/7] intel-iommu: Make use of DMA quirks and ACS checks in IOMMU groups
  2012-05-31 19:47   ` Don Dutile
@ 2012-05-31 20:41     ` Alex Williamson
  0 siblings, 0 replies; 21+ messages in thread
From: Alex Williamson @ 2012-05-31 20:41 UTC (permalink / raw)
  To: Don Dutile
  Cc: joerg.roedel, dwmw2, iommu, bhelgaas, benh, aik, david,
	konrad.wilk, linux-kernel, linux-pci, gregkh, liuj97

On Thu, 2012-05-31 at 15:47 -0400, Don Dutile wrote:
> On 05/30/2012 04:19 PM, Alex Williamson wrote:
> > Work around broken devices and adhere to ACS support when determining
> > IOMMU grouping.
> >
> > Signed-off-by: Alex Williamson<alex.williamson@redhat.com>
> > ---
> >
> >   drivers/iommu/intel-iommu.c |   25 +++++++++++++++++++++++++
> >   1 file changed, 25 insertions(+)
> >
> > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> > index 4a43452..ebf2b31 100644
> > --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -4090,6 +4090,14 @@ static int intel_iommu_domain_has_cap(struct iommu_domain *domain,
> >   	return 0;
> >   }
> >
> > +static void swap_pci_ref(struct pci_dev **from, struct pci_dev *to)
> > +{
> > +	pci_dev_put(*from);
> > +	*from = to;
> > +}
> > +
> > +#define REQ_ACS_FLAGS	(PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF)
> > +
> >   static int intel_iommu_add_device(struct device *dev)
> >   {
> >   	struct pci_dev *pdev = to_pci_dev(dev);
> > @@ -4112,6 +4120,23 @@ static int intel_iommu_add_device(struct device *dev)
> >   	} else
> >   		dma_pdev = pci_dev_get(pdev);
> >
> > +	swap_pci_ref(&dma_pdev, pci_get_dma_source(dma_pdev));
> > +
> > +	if (dma_pdev->multifunction&&
> > +	    !pci_acs_enabled(dma_pdev, REQ_ACS_FLAGS))
> > +		swap_pci_ref(&dma_pdev,
> > +			     pci_get_slot(dma_pdev->bus,
> > +					  PCI_DEVFN(PCI_SLOT(dma_pdev->devfn),
> > +					  0)));
> > +
> > +	while (!pci_is_root_bus(dma_pdev->bus)) {
> > +		if (pci_acs_path_enabled(dma_pdev->bus->self,
> > +					 NULL, REQ_ACS_FLAGS))
> > +			break;
> > +
> > +		swap_pci_ref(&dma_pdev, pci_dev_get(dma_pdev->bus->self));
> > +	}
> > +
> I'm having deja-vu on this patch....
> .... why not just make the above two patches as two functions in drivers/iommu/iommu.c,
> one exported for these two modules (and maybe others someday...), e.g., iommu_pdev_put())
> which [intel-,amd-]iommu.c call ?

Do we want to put PCI specific code in the IOMMU base code?  I think
we'd want to avoid that.  I have no objection to reducing the
duplication, but I'm not sure who else is going to use this, where to
put it and what to call it.  So, I left the duplication here so it might
get more review from the IOMMU driver maintainers.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/7] IOMMU: Groups support
  2012-05-30 20:18 [PATCH v2 0/7] IOMMU: Groups support Alex Williamson
                   ` (6 preceding siblings ...)
  2012-05-30 20:19 ` [PATCH v2 7/7] iommu: Remove group_mf Alex Williamson
@ 2012-06-06 11:05 ` Joerg Roedel
  2012-06-11 15:37   ` Alex Williamson
  2012-06-25 11:29 ` Joerg Roedel
  8 siblings, 1 reply; 21+ messages in thread
From: Joerg Roedel @ 2012-06-06 11:05 UTC (permalink / raw)
  To: Alex Williamson
  Cc: dwmw2, iommu, bhelgaas, benh, aik, david, konrad.wilk,
	linux-kernel, linux-pci, gregkh, ddutile, liuj97

On Wed, May 30, 2012 at 02:18:29PM -0600, Alex Williamson wrote:
> v2:
>  - Trickle down changes from pci_get_dma_source() to better handle
>    PCI device reference counting in IOMMU driver patches

Okay, I looked again over the code and looks good for merging to me. The
IOMMU Groups code depends on changes in the PCI code. We need to work
out how to merge this.

Bjorn,

I need the PCI changes before I can merge the IOMMU Groups code. There
are two ways that would work for me:

	a) You Ack the PCI patch-set from Alex and I put it into my tree
	   and merge the IOMMU Group code on-top, or
	b) You merge the PCI related patch-set into a seperate branch
	   which you can merge into your next-branch and which I can
	   pull in to merge IOMMU Groups on-top.

Please let me know what works best for you.

What I would also appreciate are a few more Acks from other IOMMU
maintainers (David W.?) and from the Power side.


Thanks,

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/7] IOMMU: Groups support
  2012-06-06 11:05 ` [PATCH v2 0/7] IOMMU: Groups support Joerg Roedel
@ 2012-06-11 15:37   ` Alex Williamson
  2012-06-12  2:55     ` Bjorn Helgaas
  2012-06-18 17:31     ` Alex Williamson
  0 siblings, 2 replies; 21+ messages in thread
From: Alex Williamson @ 2012-06-11 15:37 UTC (permalink / raw)
  To: Joerg Roedel, Alexey Kardashevskiy, Benjamin Herrenschmidt,
	David Woodhouse
  Cc: iommu, bhelgaas, david, konrad.wilk, linux-kernel, linux-pci,
	gregkh, ddutile, liuj97

On Wed, 2012-06-06 at 13:05 +0200, Joerg Roedel wrote:
> On Wed, May 30, 2012 at 02:18:29PM -0600, Alex Williamson wrote:
> > v2:
> >  - Trickle down changes from pci_get_dma_source() to better handle
> >    PCI device reference counting in IOMMU driver patches
> 
> Okay, I looked again over the code and looks good for merging to me.

Great, thanks Joerg!

>  The
> IOMMU Groups code depends on changes in the PCI code. We need to work
> out how to merge this.
> 
> Bjorn,
> 
> I need the PCI changes before I can merge the IOMMU Groups code. There
> are two ways that would work for me:
> 
> 	a) You Ack the PCI patch-set from Alex and I put it into my tree
> 	   and merge the IOMMU Group code on-top, or
> 	b) You merge the PCI related patch-set into a seperate branch
> 	   which you can merge into your next-branch and which I can
> 	   pull in to merge IOMMU Groups on-top.
> 
> Please let me know what works best for you.

I also poked Bjorn with a minor v3 spin on his series, no change to this
series.

> What I would also appreciate are a few more Acks from other IOMMU
> maintainers (David W.?) and from the Power side.

Alexey, BenH, can either of you comment on this series from the POWER
side?  I know Alexey has an implementation of VFIO/Qemu device
assignment working on POWER with this series.  David Woodhouse, any
comments?  Thanks,

Alex


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/7] IOMMU: Groups support
  2012-06-11 15:37   ` Alex Williamson
@ 2012-06-12  2:55     ` Bjorn Helgaas
  2012-06-18 17:31     ` Alex Williamson
  1 sibling, 0 replies; 21+ messages in thread
From: Bjorn Helgaas @ 2012-06-12  2:55 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Joerg Roedel, Alexey Kardashevskiy, Benjamin Herrenschmidt,
	David Woodhouse, iommu, david, konrad.wilk, linux-kernel,
	linux-pci, gregkh, ddutile, liuj97

On Mon, Jun 11, 2012 at 8:37 AM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Wed, 2012-06-06 at 13:05 +0200, Joerg Roedel wrote:
>> On Wed, May 30, 2012 at 02:18:29PM -0600, Alex Williamson wrote:
>> > v2:
>> >  - Trickle down changes from pci_get_dma_source() to better handle
>> >    PCI device reference counting in IOMMU driver patches
>>
>> Okay, I looked again over the code and looks good for merging to me.
>
> Great, thanks Joerg!
>
>>  The
>> IOMMU Groups code depends on changes in the PCI code. We need to work
>> out how to merge this.
>>
>> Bjorn,
>>
>> I need the PCI changes before I can merge the IOMMU Groups code. There
>> are two ways that would work for me:
>>
>>       a) You Ack the PCI patch-set from Alex and I put it into my tree
>>          and merge the IOMMU Group code on-top, or
>>       b) You merge the PCI related patch-set into a seperate branch
>>          which you can merge into your next-branch and which I can
>>          pull in to merge IOMMU Groups on-top.
>>
>> Please let me know what works best for you.

Joerg, I put the PCI changes into my "next" branch
(http://git.kernel.org/?p=linux/kernel/git/helgaas/pci.git;a=shortlog;h=refs/heads/next).
 My feeble excuse is that I was on vacation last week, but that's
pretty lame because this has been hanging for much longer than that.

Bjorn

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/7] IOMMU: Groups support
  2012-06-11 15:37   ` Alex Williamson
  2012-06-12  2:55     ` Bjorn Helgaas
@ 2012-06-18 17:31     ` Alex Williamson
  2012-06-19  9:49       ` Joerg Roedel
  1 sibling, 1 reply; 21+ messages in thread
From: Alex Williamson @ 2012-06-18 17:31 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Alexey Kardashevskiy, Benjamin Herrenschmidt, David Woodhouse,
	iommu, bhelgaas, david, konrad.wilk, linux-kernel, linux-pci,
	gregkh, ddutile, liuj97

Anyone?  Alexey & BenH, I know you're familiar with IOMMU groups, please
give Joerg some kind of indication that you approve of this for POWER.
David Woodhouse, these patches touch intel-iommu, please ack or comment:

https://lkml.org/lkml/2012/5/30/445
https://lkml.org/lkml/2012/5/30/450

Joerg, maybe we can just add CC: to the commit log?  Thanks,

Alex

On Mon, 2012-06-11 at 09:37 -0600, Alex Williamson wrote:
> On Wed, 2012-06-06 at 13:05 +0200, Joerg Roedel wrote:
> > On Wed, May 30, 2012 at 02:18:29PM -0600, Alex Williamson wrote:
> > > v2:
> > >  - Trickle down changes from pci_get_dma_source() to better handle
> > >    PCI device reference counting in IOMMU driver patches
> > 
> > Okay, I looked again over the code and looks good for merging to me.
> 
> Great, thanks Joerg!
> 
> >  The
> > IOMMU Groups code depends on changes in the PCI code. We need to work
> > out how to merge this.
> > 
> > Bjorn,
> > 
> > I need the PCI changes before I can merge the IOMMU Groups code. There
> > are two ways that would work for me:
> > 
> > 	a) You Ack the PCI patch-set from Alex and I put it into my tree
> > 	   and merge the IOMMU Group code on-top, or
> > 	b) You merge the PCI related patch-set into a seperate branch
> > 	   which you can merge into your next-branch and which I can
> > 	   pull in to merge IOMMU Groups on-top.
> > 
> > Please let me know what works best for you.
> 
> I also poked Bjorn with a minor v3 spin on his series, no change to this
> series.
> 
> > What I would also appreciate are a few more Acks from other IOMMU
> > maintainers (David W.?) and from the Power side.
> 
> Alexey, BenH, can either of you comment on this series from the POWER
> side?  I know Alexey has an implementation of VFIO/Qemu device
> assignment working on POWER with this series.  David Woodhouse, any
> comments?  Thanks,
> 
> Alex





^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/7] IOMMU: Groups support
  2012-06-18 17:31     ` Alex Williamson
@ 2012-06-19  9:49       ` Joerg Roedel
  2012-06-19 10:40         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 21+ messages in thread
From: Joerg Roedel @ 2012-06-19  9:49 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Alexey Kardashevskiy, Benjamin Herrenschmidt, David Woodhouse,
	iommu, bhelgaas, david, konrad.wilk, linux-kernel, linux-pci,
	gregkh, ddutile, liuj97

Hi,

On Mon, Jun 18, 2012 at 11:31:43AM -0600, Alex Williamson wrote:
> Anyone?  Alexey & BenH, I know you're familiar with IOMMU groups, please
> give Joerg some kind of indication that you approve of this for POWER.
> David Woodhouse, these patches touch intel-iommu, please ack or comment:
> 
> https://lkml.org/lkml/2012/5/30/445
> https://lkml.org/lkml/2012/5/30/450
> 
> Joerg, maybe we can just add CC: to the commit log?  Thanks,

Well, we can't hold up things forever. Given that the POWER guys had
long discussions with you about previous versions but not about this one
counts as a silent Ack for me ;)

So I already applied the patches into my local tree yesterday and gave
them some additional testing. All looks good so far. I think I will push
the result soon to get this into linux-next.

I'll probably rebase to add the Cc's to the commits before pushing it
out.

Regards,

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/7] IOMMU: Groups support
  2012-06-19  9:49       ` Joerg Roedel
@ 2012-06-19 10:40         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 21+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-19 10:40 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Alex Williamson, Alexey Kardashevskiy, David Woodhouse, iommu,
	bhelgaas, david, konrad.wilk, linux-kernel, linux-pci, gregkh,
	ddutile, liuj97

On Tue, 2012-06-19 at 11:49 +0200, Joerg Roedel wrote:
> Well, we can't hold up things forever. Given that the POWER guys had
> long discussions with you about previous versions but not about this one
> counts as a silent Ack for me ;)

Everybody was waiting for me and I was out for 3 weeks following some
surgery ... I'll review that stuff ASAP (as in tomorrow if I can, else
before end of the week).

> So I already applied the patches into my local tree yesterday and gave
> them some additional testing. All looks good so far. I think I will push
> the result soon to get this into linux-next.
> 
> I'll probably rebase to add the Cc's to the commits before pushing it
> out.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/7] driver core: Add iommu_group tracking to struct device
  2012-05-30 20:18 ` [PATCH v2 1/7] driver core: Add iommu_group tracking to struct device Alex Williamson
@ 2012-06-20  5:45   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 21+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-20  5:45 UTC (permalink / raw)
  To: Alex Williamson
  Cc: joerg.roedel, dwmw2, iommu, bhelgaas, aik, david, konrad.wilk,
	linux-kernel, linux-pci, gregkh, ddutile, liuj97

On Wed, 2012-05-30 at 14:18 -0600, Alex Williamson wrote:
> IOMMU groups allow IOMMU drivers to represent DMA visibility
> and isolation of devices.  Multiple devices may be grouped
> together for the purposes of DMA.  Placing a pointer on
> struct device enable easy access for things like streaming
> DMA programming and drivers like VFIO.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

> ---
> 
>  include/linux/device.h |    2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 161d962..d0e4d99 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -36,6 +36,7 @@ struct subsys_private;
>  struct bus_type;
>  struct device_node;
>  struct iommu_ops;
> +struct iommu_group;
>  
>  struct bus_attribute {
>  	struct attribute	attr;
> @@ -687,6 +688,7 @@ struct device {
>  	const struct attribute_group **groups;	/* optional groups */
>  
>  	void	(*release)(struct device *dev);
> +	struct iommu_group	*iommu_group;
>  };
>  
>  /* Get the wakeup routines, which depend on struct device */
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/7] iommu: IOMMU Groups
  2012-05-30 20:18 ` [PATCH v2 2/7] iommu: IOMMU Groups Alex Williamson
@ 2012-06-20 10:01   ` Benjamin Herrenschmidt
  2012-06-20 16:48     ` Alex Williamson
  0 siblings, 1 reply; 21+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-20 10:01 UTC (permalink / raw)
  To: Alex Williamson
  Cc: joerg.roedel, dwmw2, iommu, bhelgaas, aik, david, konrad.wilk,
	linux-kernel, linux-pci, gregkh, ddutile, liuj97

On Wed, 2012-05-30 at 14:18 -0600, Alex Williamson wrote:

> IOMMU groups also include a userspace representation in sysfs under
> /sys/kernel/iommu_groups.  When allocated, each group is given a
> dynamically assign ID (int).  The ID is managed by the core IOMMU group
> code to support multiple heterogeneous iommu drivers, which could
> potentially collide in group naming/numbering.  This also keeps group
> IDs to small, easily managed values.  A directory is created under
> /sys/kernel/iommu_groups for each group.  A further subdirectory named
> "devices" contains links to each device within the group.  The iommu_group
> file in the device's sysfs directory, which formerly contained a group
> number when read, is now a link to the iommu group.  Example:

So first, I'm generally ok with the patch, I have a few comments mostly
for discussion and possible further improvements, but so far nothing
that can't be done via subsequent patches, so let's start with

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

Now:

How easy would it be add our own files there (in sysfs) ? I'm thinking
mostly for debug/diagnostic purposes it would be handy to show some HW
state related to the group or should I just add debugfs stuff
elsewhere ?

> This patch also extends the IOMMU API to allow attaching groups to
> domains.  This is currently a simple wrapper for iterating through
> devices within a group, but it's expected that the IOMMU API may
> eventually make groups a more integral part of domains.

I assume that by domains you mean "iommu domains" ? Just to be sure
because we also have PCI domains so it can be confusing :-)

> +/**
> + * iommu_group_alloc - Allocate a new group
> + * @name: Optional name to associate with group, visible in sysfs
> + *
> + * This function is called by an iommu driver to allocate a new iommu
> + * group.  The iommu group represents the minimum granularity of the iommu.
> + * Upon successful return, the caller holds a reference to the supplied
> + * group in order to hold the group until devices are added.  Use
> + * iommu_group_put() to release this extra reference count, allowing the
> + * group to be automatically reclaimed once it has no devices or external
> + * references.
> + */
> +struct iommu_group *iommu_group_alloc(void)
>  {
> -	unsigned int groupid;
> +	struct iommu_group *group;
> +	int ret;
> +
> +	group = kzalloc(sizeof(*group), GFP_KERNEL);
> +	if (!group)
> +		return ERR_PTR(-ENOMEM);
> +
> +	group->kobj.kset = iommu_group_kset;
> +	mutex_init(&group->mutex);
> +	INIT_LIST_HEAD(&group->devices);
> +	BLOCKING_INIT_NOTIFIER_HEAD(&group->notifier);
> +
> +	mutex_lock(&iommu_group_mutex);
> +
> +again:
> +	if (unlikely(0 == ida_pre_get(&iommu_group_ida, GFP_KERNEL))) {
> +		kfree(group);
> +		mutex_unlock(&iommu_group_mutex);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	if (-EAGAIN == ida_get_new(&iommu_group_ida, &group->id))
> +		goto again;
> +
> +	mutex_unlock(&iommu_group_mutex);
>  
> -	if (iommu_device_group(dev, &groupid) == 0)
> -		return device_create_file(dev, &dev_attr_iommu_group);
> +	ret = kobject_init_and_add(&group->kobj, &iommu_group_ktype,
> +				   NULL, "%d", group->id);
> +	if (ret) {
> +		mutex_lock(&iommu_group_mutex);
> +		ida_remove(&iommu_group_ida, group->id);
> +		mutex_unlock(&iommu_group_mutex);
> +		kfree(group);
> +		return ERR_PTR(ret);
> +	}
> +
> +	group->devices_kobj = kobject_create_and_add("devices", &group->kobj);
> +	if (!group->devices_kobj) {
> +		kobject_put(&group->kobj); /* triggers .release & free */
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	/*
> +	 * The devices_kobj holds a reference on the group kobject, so
> +	 * as long as that exists so will the group.  We can therefore
> +	 * use the devices_kobj for reference counting.
> +	 */
> +	kobject_put(&group->kobj);
> +
> +	return group;
> +}
> +EXPORT_SYMBOL_GPL(iommu_group_alloc);
> +
> +/**
> + * iommu_group_get_iommudata - retrieve iommu_data registered for a group
> + * @group: the group
> + *
> + * iommu drivers can store data in the group for use when doing iommu
> + * operations.  This function provides a way to retrieve it.  Caller
> + * should hold a group reference.
> + */
> +void *iommu_group_get_iommudata(struct iommu_group *group)
> +{
> +	return group->iommu_data;
> +}
> +EXPORT_SYMBOL_GPL(iommu_group_get_iommudata);

That probably wants to be a static inline ? No biggie, could be done in
a followup patch if we really care.

> +/**
> + * iommu_group_set_iommudata - set iommu_data for a group
> + * @group: the group
> + * @iommu_data: new data
> + * @release: release function for iommu_data
> + *
> + * iommu drivers can store data in the group for use when doing iommu
> + * operations.  This function provides a way to set the data after
> + * the group has been allocated.  Caller should hold a group reference.
> + */
> +void iommu_group_set_iommudata(struct iommu_group *group, void *iommu_data,
> +			       void (*release)(void *iommu_data))
> +{
> +	group->iommu_data = iommu_data;
> +	group->iommu_data_release = release;
> +}
> +EXPORT_SYMBOL_GPL(iommu_group_set_iommudata);
> +
> +/**
> + * iommu_group_set_name - set name for a group
> + * @group: the group
> + * @name: name
> + *
> + * Allow iommu driver to set a name for a group.  When set it will
> + * appear in a name attribute file under the group in sysfs.
> + */
> +int iommu_group_set_name(struct iommu_group *group, const char *name)
> +{
> +	int ret;
> +
> +	if (group->name) {
> +		iommu_group_remove_file(group, &iommu_group_attr_name);
> +		kfree(group->name);
> +		group->name = NULL;
> +		if (!name)
> +			return 0;
> +	}
> +
> +	group->name = kstrdup(name, GFP_KERNEL);
> +	if (!group->name)
> +		return -ENOMEM;
> +
> +	ret = iommu_group_create_file(group, &iommu_group_attr_name);
> +	if (ret) {
> +		kfree(group->name);
> +		group->name = NULL;
> +		return ret;
> +	}
>  
>  	return 0;
>  }
> +EXPORT_SYMBOL_GPL(iommu_group_set_name);
>  
> -static int remove_iommu_group(struct device *dev)
> +/**
> + * iommu_group_add_device - add a device to an iommu group
> + * @group: the group into which to add the device (reference should be held)
> + * @dev: the device
> + *
> + * This function is called by an iommu driver to add a device into a
> + * group.  Adding a device increments the group reference count.
> + */
> +int iommu_group_add_device(struct iommu_group *group, struct device *dev)
>  {
> -	unsigned int groupid;
> +	int ret, i = 0;
> +	struct iommu_device *device;
> +
> +	device = kzalloc(sizeof(*device), GFP_KERNEL);
> +	if (!device)
> +		return -ENOMEM;
> +
> +	device->dev = dev;
>  
> -	if (iommu_device_group(dev, &groupid) == 0)
> -		device_remove_file(dev, &dev_attr_iommu_group);
> +	ret = sysfs_create_link(&dev->kobj, &group->kobj, "iommu_group");
> +	if (ret) {
> +		kfree(device);
> +		return ret;
> +	}
> +
> +	device->name = kasprintf(GFP_KERNEL, "%s", kobject_name(&dev->kobj));
> +rename:
> +	if (!device->name) {
> +		sysfs_remove_link(&dev->kobj, "iommu_group");
> +		kfree(device);
> +		return -ENOMEM;
> +	}
>  
> +	ret = sysfs_create_link_nowarn(group->devices_kobj,
> +				       &dev->kobj, device->name);
> +	if (ret) {
> +		kfree(device->name);
> +		if (ret == -EEXIST && i >= 0) {
> +			/*
> +			 * Account for the slim chance of collision
> +			 * and append an instance to the name.
> +			 */
> +			device->name = kasprintf(GFP_KERNEL, "%s.%d",
> +						 kobject_name(&dev->kobj), i++);
> +			goto rename;
> +		}
> +
> +		sysfs_remove_link(&dev->kobj, "iommu_group");
> +		kfree(device);
> +		return ret;
> +	}
> +
> +	kobject_get(group->devices_kobj);
> +
> +	dev->iommu_group = group;
> +
> +	mutex_lock(&group->mutex);
> +	list_add_tail(&device->list, &group->devices);
> +	mutex_unlock(&group->mutex);
> +
> +	/* Notify any listeners about change to group. */
> +	blocking_notifier_call_chain(&group->notifier,
> +				     IOMMU_GROUP_NOTIFY_ADD_DEVICE, dev);
>  	return 0;
>  }
> +EXPORT_SYMBOL_GPL(iommu_group_add_device);

There's of course a race here, not sure what we can do about it though
(if the device is removed before the notification is finalized). It
might not even be worth bothering. I suppose we assume the caller holds
a reference so the struct device itself won't go away until we have
returned anyway, however I worry that the "client" might end up getting
the remove notification before it gets the add :-)

Here too, something that we can sort out in a subsequent patch if worth
it.

Or we can just say that it's up to the callers (platform, hotplug
code, ...) to not call add and remove racily.

> -static int iommu_device_notifier(struct notifier_block *nb,
> -				 unsigned long action, void *data)
> +/**
> + * iommu_group_remove_device - remove a device from it's current group
> + * @dev: device to be removed
> + *
> + * This function is called by an iommu driver to remove the device from
> + * it's current group.  This decrements the iommu group reference count.
> + */
> +void iommu_group_remove_device(struct device *dev)
> +{
> +	struct iommu_group *group = dev->iommu_group;
> +	struct iommu_device *tmp_device, *device = NULL;
> +
> +	/* Pre-notify listeners that a device is being removed. */
> +	blocking_notifier_call_chain(&group->notifier,
> +				     IOMMU_GROUP_NOTIFY_DEL_DEVICE, dev);
> +
> +	mutex_lock(&group->mutex);
> +	list_for_each_entry(tmp_device, &group->devices, list) {
> +		if (tmp_device->dev == dev) {
> +			device = tmp_device;
> +			list_del(&device->list);
> +			break;
> +		}
> +	}
> +	mutex_unlock(&group->mutex);
> +
> +	if (!device)
> +		return;
> +
> +	sysfs_remove_link(group->devices_kobj, device->name);
> +	sysfs_remove_link(&dev->kobj, "iommu_group");
> +
> +	kfree(device->name);
> +	kfree(device);
> +	dev->iommu_group = NULL;
> +	kobject_put(group->devices_kobj);
> +}
> +EXPORT_SYMBOL_GPL(iommu_group_remove_device);
> +
> +/**
> + * iommu_group_for_each_dev - iterate over each device in the group
> + * @group: the group
> + * @data: caller opaque data to be passed to callback function
> + * @fn: caller supplied callback function
> + *
> + * This function is called by group users to iterate over group devices.
> + * Callers should hold a reference count to the group during callback.
> + * The group->mutex is held across callbacks, which will block calls to
> + * iommu_group_add/remove_device.
> + */
> +int iommu_group_for_each_dev(struct iommu_group *group, void *data,
> +			     int (*fn)(struct device *, void *))
> +{
> +	struct iommu_device *device;
> +	int ret = 0;
> +
> +	mutex_lock(&group->mutex);
> +	list_for_each_entry(device, &group->devices, list) {
> +		ret = fn(device->dev, data);
> +		if (ret)
> +			break;
> +	}
> +	mutex_unlock(&group->mutex);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_group_for_each_dev);
> +
> +/**
> + * iommu_group_get - Return the group for a device and increment reference
> + * @dev: get the group that this device belongs to
> + *
> + * This function is called by iommu drivers and users to get the group
> + * for the specified device.  If found, the group is returned and the group
> + * reference in incremented, else NULL.
> + */
> +struct iommu_group *iommu_group_get(struct device *dev)
> +{
> +	struct iommu_group *group = dev->iommu_group;
> +
> +	if (group)
> +		kobject_get(group->devices_kobj);
> +
> +	return group;
> +}
> +EXPORT_SYMBOL_GPL(iommu_group_get);
> +
> +/**
> + * iommu_group_put - Decrement group reference
> + * @group: the group to use
> + *
> + * This function is called by iommu drivers and users to release the
> + * iommu group.  Once the reference count is zero, the group is released.
> + */
> +void iommu_group_put(struct iommu_group *group)
> +{
> +	if (group)
> +		kobject_put(group->devices_kobj);
> +}
> +EXPORT_SYMBOL_GPL(iommu_group_put);
> +
> +/**
> + * iommu_group_register_notifier - Register a notifier for group changes
> + * @group: the group to watch
> + * @nb: notifier block to signal
> + *
> + * This function allows iommu group users to track changes in a group.
> + * See include/linux/iommu.h for actions sent via this notifier.  Caller
> + * should hold a reference to the group throughout notifier registration.
> + */
> +int iommu_group_register_notifier(struct iommu_group *group,
> +				  struct notifier_block *nb)
> +{
> +	return blocking_notifier_chain_register(&group->notifier, nb);
> +}
> +EXPORT_SYMBOL_GPL(iommu_group_register_notifier);
> +
> +/**
> + * iommu_group_unregister_notifier - Unregister a notifier
> + * @group: the group to watch
> + * @nb: notifier block to signal
> + *
> + * Unregister a previously registered group notifier block.
> + */
> +int iommu_group_unregister_notifier(struct iommu_group *group,
> +				    struct notifier_block *nb)
> +{
> +	return blocking_notifier_chain_unregister(&group->notifier, nb);
> +}
> +EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
> +
> +/**
> + * iommu_group_id - Return ID for a group
> + * @group: the group to ID
> + *
> + * Return the unique ID for the group matching the sysfs group number.
> + */
> +int iommu_group_id(struct iommu_group *group)
> +{
> +	return group->id;
> +}
> +EXPORT_SYMBOL_GPL(iommu_group_id);
> +
> +static int add_iommu_group(struct device *dev, void *data)
> +{
> +	struct iommu_ops *ops = data;
> +
> +	if (!ops->add_device)
> +		return -ENODEV;
> +
> +	WARN_ON(dev->iommu_group);
> +
> +	ops->add_device(dev);
> +
> +	return 0;
> +}
> +
> +static int iommu_bus_notifier(struct notifier_block *nb,
> +			      unsigned long action, void *data)
>  {
>  	struct device *dev = data;
> +	struct iommu_ops *ops = dev->bus->iommu_ops;
> +	struct iommu_group *group;
> +	unsigned long group_action = 0;
> +
> +	/*
> +	 * ADD/DEL call into iommu driver ops if provided, which may
> +	 * result in ADD/DEL notifiers to group->notifier
> +	 */
> +	if (action == BUS_NOTIFY_ADD_DEVICE) {
> +		if (ops->add_device)
> +			return ops->add_device(dev);
> +	} else if (action == BUS_NOTIFY_DEL_DEVICE) {
> +		if (ops->remove_device && dev->iommu_group) {
> +			ops->remove_device(dev);
> +			return 0;
> +		}
> +	}
>  
> -	if (action == BUS_NOTIFY_ADD_DEVICE)
> -		return add_iommu_group(dev, NULL);
> -	else if (action == BUS_NOTIFY_DEL_DEVICE)
> -		return remove_iommu_group(dev);
> +	/*
> +	 * Remaining BUS_NOTIFYs get filtered and republished to the
> +	 * group, if anyone is listening
> +	 */
> +	group = iommu_group_get(dev);
> +	if (!group)
> +		return 0;
>  
> +	switch (action) {
> +	case BUS_NOTIFY_BIND_DRIVER:
> +		group_action = IOMMU_GROUP_NOTIFY_BIND_DRIVER;
> +		break;
> +	case BUS_NOTIFY_BOUND_DRIVER:
> +		group_action = IOMMU_GROUP_NOTIFY_BOUND_DRIVER;
> +		break;
> +	case BUS_NOTIFY_UNBIND_DRIVER:
> +		group_action = IOMMU_GROUP_NOTIFY_UNBIND_DRIVER;
> +		break;
> +	case BUS_NOTIFY_UNBOUND_DRIVER:
> +		group_action = IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER;
> +		break;
> +	}
> +
> +	if (group_action)
> +		blocking_notifier_call_chain(&group->notifier,
> +					     group_action, dev);
> +
> +	iommu_group_put(group);
>  	return 0;
>  }
>
> -static struct notifier_block iommu_device_nb = {
> -	.notifier_call = iommu_device_notifier,
> +static struct notifier_block iommu_bus_nb = {
> +	.notifier_call = iommu_bus_notifier,
>  };
>  
>  static void iommu_bus_init(struct bus_type *bus, struct iommu_ops *ops)
>  {
> -	bus_register_notifier(bus, &iommu_device_nb);
> -	bus_for_each_dev(bus, NULL, NULL, add_iommu_group);
> +	bus_register_notifier(bus, &iommu_bus_nb);
> +	bus_for_each_dev(bus, NULL, ops, add_iommu_group);
>  }

So if I understand correctly this is a rework of a piece of
infrastructure that powerpc doesn't use today, which uses the existing
"iommu_ops" to automatically signal the iommu of added/removed devices,
right ?

Do we need to warn somewhere that the above code is racy vs. concurrent
hotplug and thus might end up adding a device twice ? (It's up to
iommu->add_device implementation to then ensure it doesn't mess up I
assume).

>  /**
> @@ -192,6 +667,45 @@ void iommu_detach_device(struct iommu_domain *domain, struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(iommu_detach_device);
>  
> +/*
> + * IOMMU groups are really the natrual working unit of the IOMMU, but
> + * the IOMMU API works on domains and devices.  Bridge that gap by
> + * iterating over the devices in a group.  Ideally we'd have a single
> + * device which represents the requestor ID of the group, but we also
> + * allow IOMMU drivers to create policy defined minimum sets, where
> + * the physical hardware may be able to distiguish members, but we
> + * wish to group them at a higher level (ex. untrusted multi-function
> + * PCI devices).  Thus we attach each device.
> + */
> +static int iommu_group_do_attach_device(struct device *dev, void *data)
> +{
> +	struct iommu_domain *domain = data;
> +
> +	return iommu_attach_device(domain, dev);
> +}
> +
> +int iommu_attach_group(struct iommu_domain *domain, struct iommu_group *group)
> +{
> +	return iommu_group_for_each_dev(group, domain,
> +					iommu_group_do_attach_device);
> +}
> +EXPORT_SYMBOL_GPL(iommu_attach_group);
> +
> +static int iommu_group_do_detach_device(struct device *dev, void *data)
> +{
> +	struct iommu_domain *domain = data;
> +
> +	iommu_detach_device(domain, dev);
> +
> +	return 0;
> +}
> +
> +void iommu_detach_group(struct iommu_domain *domain, struct iommu_group *group)
> +{
> +	iommu_group_for_each_dev(group, domain, iommu_group_do_detach_device);
> +}
> +EXPORT_SYMBOL_GPL(iommu_detach_group);
> +

So as you probably are aware by now, we have a 1:1 group/domain
relationship on power (and don't implement the iommu API today) but I
have no objection with the API, I'll have to check how Alexey hooked our
code up (I haven't had a chance to look at it just yet).

>  phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
>  			       unsigned long iova)
>  {
> @@ -336,11 +850,15 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
>  }
>  EXPORT_SYMBOL_GPL(iommu_unmap);
>  
> -int iommu_device_group(struct device *dev, unsigned int *groupid)
> +static int __init iommu_init(void)
>  {
> -	if (iommu_present(dev->bus) && dev->bus->iommu_ops->device_group)
> -		return dev->bus->iommu_ops->device_group(dev, groupid);
> +	iommu_group_kset = kset_create_and_add("iommu_groups",
> +					       NULL, kernel_kobj);
> +	ida_init(&iommu_group_ida);
> +	mutex_init(&iommu_group_mutex);
>  
> -	return -ENODEV;
> +	BUG_ON(!iommu_group_kset);
> +
> +	return 0;
>  }
> -EXPORT_SYMBOL_GPL(iommu_device_group);
> +subsys_initcall(iommu_init);
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 450293f..a71df92 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -26,6 +26,7 @@
>  #define IOMMU_CACHE	(4) /* DMA cache coherency */
>  
>  struct iommu_ops;
> +struct iommu_group;
>  struct bus_type;
>  struct device;
>  struct iommu_domain;
> @@ -60,6 +61,8 @@ struct iommu_domain {
>   * @iova_to_phys: translate iova to physical address
>   * @domain_has_cap: domain capabilities query
>   * @commit: commit iommu domain
> + * @add_device: add device to iommu grouping
> + * @remove_device: remove device from iommu grouping
>   * @pgsize_bitmap: bitmap of supported page sizes
>   */
>  struct iommu_ops {
> @@ -75,10 +78,18 @@ struct iommu_ops {
>  				    unsigned long iova);
>  	int (*domain_has_cap)(struct iommu_domain *domain,
>  			      unsigned long cap);
> -	int (*device_group)(struct device *dev, unsigned int *groupid);
> +	int (*add_device)(struct device *dev);
> +	void (*remove_device)(struct device *dev);
>  	unsigned long pgsize_bitmap;
>  };
>  
> +#define IOMMU_GROUP_NOTIFY_ADD_DEVICE		1 /* Device added */
> +#define IOMMU_GROUP_NOTIFY_DEL_DEVICE		2 /* Pre Device removed */
> +#define IOMMU_GROUP_NOTIFY_BIND_DRIVER		3 /* Pre Driver bind */
> +#define IOMMU_GROUP_NOTIFY_BOUND_DRIVER		4 /* Post Driver bind */
> +#define IOMMU_GROUP_NOTIFY_UNBIND_DRIVER	5 /* Pre Driver unbind */
> +#define IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER	6 /* Post Driver unbind */
> +
>  extern int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops);
>  extern bool iommu_present(struct bus_type *bus);
>  extern struct iommu_domain *iommu_domain_alloc(struct bus_type *bus);
> @@ -97,7 +108,29 @@ extern int iommu_domain_has_cap(struct iommu_domain *domain,
>  				unsigned long cap);
>  extern void iommu_set_fault_handler(struct iommu_domain *domain,
>  			iommu_fault_handler_t handler, void *token);
> -extern int iommu_device_group(struct device *dev, unsigned int *groupid);
> +
> +extern int iommu_attach_group(struct iommu_domain *domain,
> +			      struct iommu_group *group);
> +extern void iommu_detach_group(struct iommu_domain *domain,
> +			       struct iommu_group *group);
> +extern struct iommu_group *iommu_group_alloc(void);
> +extern void *iommu_group_get_iommudata(struct iommu_group *group);
> +extern void iommu_group_set_iommudata(struct iommu_group *group,
> +				      void *iommu_data,
> +				      void (*release)(void *iommu_data));
> +extern int iommu_group_set_name(struct iommu_group *group, const char *name);
> +extern int iommu_group_add_device(struct iommu_group *group,
> +				  struct device *dev);
> +extern void iommu_group_remove_device(struct device *dev);
> +extern int iommu_group_for_each_dev(struct iommu_group *group, void *data,
> +				    int (*fn)(struct device *, void *));
> +extern struct iommu_group *iommu_group_get(struct device *dev);
> +extern void iommu_group_put(struct iommu_group *group);
> +extern int iommu_group_register_notifier(struct iommu_group *group,
> +					 struct notifier_block *nb);
> +extern int iommu_group_unregister_notifier(struct iommu_group *group,
> +					   struct notifier_block *nb);
> +extern int iommu_group_id(struct iommu_group *group);
>  
>  /**
>   * report_iommu_fault() - report about an IOMMU fault to the IOMMU framework
> @@ -142,6 +175,7 @@ static inline int report_iommu_fault(struct iommu_domain *domain,
>  #else /* CONFIG_IOMMU_API */
>  
>  struct iommu_ops {};
> +struct iommu_group {};
>  
>  static inline bool iommu_present(struct bus_type *bus)
>  {
> @@ -197,11 +231,75 @@ static inline void iommu_set_fault_handler(struct iommu_domain *domain,
>  {
>  }
>  
> -static inline int iommu_device_group(struct device *dev, unsigned int *groupid)
> +int iommu_attach_group(struct iommu_domain *domain, struct iommu_group *group)
> +{
> +	return -ENODEV;
> +}
> +
> +void iommu_detach_group(struct iommu_domain *domain, struct iommu_group *group)
> +{
> +}
> +
> +struct iommu_group *iommu_group_alloc(void)
> +{
> +	return ERR_PTR(-ENODEV);
> +}
> +
> +void *iommu_group_get_iommudata(struct iommu_group *group)
> +{
> +	return NULL;
> +}
> +
> +void iommu_group_set_iommudata(struct iommu_group *group, void *iommu_data,
> +			       void (*release)(void *iommu_data))
> +{
> +}
> +
> +int iommu_group_set_name(struct iommu_group *group, const char *name)
> +{
> +	return -ENODEV;
> +}
> +
> +int iommu_group_add_device(struct iommu_group *group, struct device *dev)
> +{
> +	return -ENODEV;
> +}
> +
> +void iommu_group_remove_device(struct device *dev)
> +{
> +}
> +
> +int iommu_group_for_each_dev(struct iommu_group *group, void *data,
> +			     int (*fn)(struct device *, void *))
> +{
> +	return -ENODEV;
> +}
> +
> +struct iommu_group *iommu_group_get(struct device *dev)
> +{
> +	return NULL;
> +}
> +
> +void iommu_group_put(struct iommu_group *group)
> +{
> +}
> +
> +int iommu_group_register_notifier(struct iommu_group *group,
> +				  struct notifier_block *nb)
>  {
>  	return -ENODEV;
>  }
>  
> +int iommu_group_unregister_notifier(struct iommu_group *group,
> +				    struct notifier_block *nb)
> +{
> +	return 0;
> +}
> +
> +int iommu_group_id(struct iommu_group *group)
> +{
> +	return -ENODEV;
> +}
>  #endif /* CONFIG_IOMMU_API */
>  
>  #endif /* __LINUX_IOMMU_H */
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/7] iommu: IOMMU Groups
  2012-06-20 10:01   ` Benjamin Herrenschmidt
@ 2012-06-20 16:48     ` Alex Williamson
  2012-06-21  0:07       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 21+ messages in thread
From: Alex Williamson @ 2012-06-20 16:48 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: joerg.roedel, dwmw2, iommu, bhelgaas, aik, david, konrad.wilk,
	linux-kernel, linux-pci, gregkh, ddutile, liuj97

On Wed, 2012-06-20 at 20:01 +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2012-05-30 at 14:18 -0600, Alex Williamson wrote:
> 
> > IOMMU groups also include a userspace representation in sysfs under
> > /sys/kernel/iommu_groups.  When allocated, each group is given a
> > dynamically assign ID (int).  The ID is managed by the core IOMMU group
> > code to support multiple heterogeneous iommu drivers, which could
> > potentially collide in group naming/numbering.  This also keeps group
> > IDs to small, easily managed values.  A directory is created under
> > /sys/kernel/iommu_groups for each group.  A further subdirectory named
> > "devices" contains links to each device within the group.  The iommu_group
> > file in the device's sysfs directory, which formerly contained a group
> > number when read, is now a link to the iommu group.  Example:
> 
> So first, I'm generally ok with the patch, I have a few comments mostly
> for discussion and possible further improvements, but so far nothing
> that can't be done via subsequent patches, so let's start with
> 
> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Thanks!

> ---
> 
> Now:
> 
> How easy would it be add our own files there (in sysfs) ? I'm thinking
> mostly for debug/diagnostic purposes it would be handy to show some HW
> state related to the group or should I just add debugfs stuff
> elsewhere ?

Well, you've got a name in sysfs that you can do whatever you want with.
You can update that as often as you like, with whatever you want.  Is
there a practical way to passthrough more attributes to the iommu
driver?

> > This patch also extends the IOMMU API to allow attaching groups to
> > domains.  This is currently a simple wrapper for iterating through
> > devices within a group, but it's expected that the IOMMU API may
> > eventually make groups a more integral part of domains.
> 
> I assume that by domains you mean "iommu domains" ? Just to be sure
> because we also have PCI domains so it can be confusing :-)

Yes, and yes it's confusing.  Just remember nothing about the IOMMU API
is PCI specific ;)

> > +/**
> > + * iommu_group_alloc - Allocate a new group
> > + * @name: Optional name to associate with group, visible in sysfs
> > + *
> > + * This function is called by an iommu driver to allocate a new iommu
> > + * group.  The iommu group represents the minimum granularity of the iommu.
> > + * Upon successful return, the caller holds a reference to the supplied
> > + * group in order to hold the group until devices are added.  Use
> > + * iommu_group_put() to release this extra reference count, allowing the
> > + * group to be automatically reclaimed once it has no devices or external
> > + * references.
> > + */
> > +struct iommu_group *iommu_group_alloc(void)
> >  {
> > -	unsigned int groupid;
> > +	struct iommu_group *group;
> > +	int ret;
> > +
> > +	group = kzalloc(sizeof(*group), GFP_KERNEL);
> > +	if (!group)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	group->kobj.kset = iommu_group_kset;
> > +	mutex_init(&group->mutex);
> > +	INIT_LIST_HEAD(&group->devices);
> > +	BLOCKING_INIT_NOTIFIER_HEAD(&group->notifier);
> > +
> > +	mutex_lock(&iommu_group_mutex);
> > +
> > +again:
> > +	if (unlikely(0 == ida_pre_get(&iommu_group_ida, GFP_KERNEL))) {
> > +		kfree(group);
> > +		mutex_unlock(&iommu_group_mutex);
> > +		return ERR_PTR(-ENOMEM);
> > +	}
> > +
> > +	if (-EAGAIN == ida_get_new(&iommu_group_ida, &group->id))
> > +		goto again;
> > +
> > +	mutex_unlock(&iommu_group_mutex);
> >  
> > -	if (iommu_device_group(dev, &groupid) == 0)
> > -		return device_create_file(dev, &dev_attr_iommu_group);
> > +	ret = kobject_init_and_add(&group->kobj, &iommu_group_ktype,
> > +				   NULL, "%d", group->id);
> > +	if (ret) {
> > +		mutex_lock(&iommu_group_mutex);
> > +		ida_remove(&iommu_group_ida, group->id);
> > +		mutex_unlock(&iommu_group_mutex);
> > +		kfree(group);
> > +		return ERR_PTR(ret);
> > +	}
> > +
> > +	group->devices_kobj = kobject_create_and_add("devices", &group->kobj);
> > +	if (!group->devices_kobj) {
> > +		kobject_put(&group->kobj); /* triggers .release & free */
> > +		return ERR_PTR(-ENOMEM);
> > +	}
> > +
> > +	/*
> > +	 * The devices_kobj holds a reference on the group kobject, so
> > +	 * as long as that exists so will the group.  We can therefore
> > +	 * use the devices_kobj for reference counting.
> > +	 */
> > +	kobject_put(&group->kobj);
> > +
> > +	return group;
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_group_alloc);
> > +
> > +/**
> > + * iommu_group_get_iommudata - retrieve iommu_data registered for a group
> > + * @group: the group
> > + *
> > + * iommu drivers can store data in the group for use when doing iommu
> > + * operations.  This function provides a way to retrieve it.  Caller
> > + * should hold a group reference.
> > + */
> > +void *iommu_group_get_iommudata(struct iommu_group *group)
> > +{
> > +	return group->iommu_data;
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_group_get_iommudata);
> 
> That probably wants to be a static inline ? No biggie, could be done in
> a followup patch if we really care.

The intention was to keep struct iommu_group private.  Anything outside
of iommu.c should just use it as an opaque object.  Exposing the struct
tempts other uses.

> > +/**
> > + * iommu_group_set_iommudata - set iommu_data for a group
> > + * @group: the group
> > + * @iommu_data: new data
> > + * @release: release function for iommu_data
> > + *
> > + * iommu drivers can store data in the group for use when doing iommu
> > + * operations.  This function provides a way to set the data after
> > + * the group has been allocated.  Caller should hold a group reference.
> > + */
> > +void iommu_group_set_iommudata(struct iommu_group *group, void *iommu_data,
> > +			       void (*release)(void *iommu_data))
> > +{
> > +	group->iommu_data = iommu_data;
> > +	group->iommu_data_release = release;
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_group_set_iommudata);
> > +
> > +/**
> > + * iommu_group_set_name - set name for a group
> > + * @group: the group
> > + * @name: name
> > + *
> > + * Allow iommu driver to set a name for a group.  When set it will
> > + * appear in a name attribute file under the group in sysfs.
> > + */
> > +int iommu_group_set_name(struct iommu_group *group, const char *name)
> > +{
> > +	int ret;
> > +
> > +	if (group->name) {
> > +		iommu_group_remove_file(group, &iommu_group_attr_name);
> > +		kfree(group->name);
> > +		group->name = NULL;
> > +		if (!name)
> > +			return 0;
> > +	}
> > +
> > +	group->name = kstrdup(name, GFP_KERNEL);
> > +	if (!group->name)
> > +		return -ENOMEM;
> > +
> > +	ret = iommu_group_create_file(group, &iommu_group_attr_name);
> > +	if (ret) {
> > +		kfree(group->name);
> > +		group->name = NULL;
> > +		return ret;
> > +	}
> >  
> >  	return 0;
> >  }
> > +EXPORT_SYMBOL_GPL(iommu_group_set_name);
> >  
> > -static int remove_iommu_group(struct device *dev)
> > +/**
> > + * iommu_group_add_device - add a device to an iommu group
> > + * @group: the group into which to add the device (reference should be held)
> > + * @dev: the device
> > + *
> > + * This function is called by an iommu driver to add a device into a
> > + * group.  Adding a device increments the group reference count.
> > + */
> > +int iommu_group_add_device(struct iommu_group *group, struct device *dev)
> >  {
> > -	unsigned int groupid;
> > +	int ret, i = 0;
> > +	struct iommu_device *device;
> > +
> > +	device = kzalloc(sizeof(*device), GFP_KERNEL);
> > +	if (!device)
> > +		return -ENOMEM;
> > +
> > +	device->dev = dev;
> >  
> > -	if (iommu_device_group(dev, &groupid) == 0)
> > -		device_remove_file(dev, &dev_attr_iommu_group);
> > +	ret = sysfs_create_link(&dev->kobj, &group->kobj, "iommu_group");
> > +	if (ret) {
> > +		kfree(device);
> > +		return ret;
> > +	}
> > +
> > +	device->name = kasprintf(GFP_KERNEL, "%s", kobject_name(&dev->kobj));
> > +rename:
> > +	if (!device->name) {
> > +		sysfs_remove_link(&dev->kobj, "iommu_group");
> > +		kfree(device);
> > +		return -ENOMEM;
> > +	}
> >  
> > +	ret = sysfs_create_link_nowarn(group->devices_kobj,
> > +				       &dev->kobj, device->name);
> > +	if (ret) {
> > +		kfree(device->name);
> > +		if (ret == -EEXIST && i >= 0) {
> > +			/*
> > +			 * Account for the slim chance of collision
> > +			 * and append an instance to the name.
> > +			 */
> > +			device->name = kasprintf(GFP_KERNEL, "%s.%d",
> > +						 kobject_name(&dev->kobj), i++);
> > +			goto rename;
> > +		}
> > +
> > +		sysfs_remove_link(&dev->kobj, "iommu_group");
> > +		kfree(device);
> > +		return ret;
> > +	}
> > +
> > +	kobject_get(group->devices_kobj);
> > +
> > +	dev->iommu_group = group;
> > +
> > +	mutex_lock(&group->mutex);
> > +	list_add_tail(&device->list, &group->devices);
> > +	mutex_unlock(&group->mutex);
> > +
> > +	/* Notify any listeners about change to group. */
> > +	blocking_notifier_call_chain(&group->notifier,
> > +				     IOMMU_GROUP_NOTIFY_ADD_DEVICE, dev);
> >  	return 0;
> >  }
> > +EXPORT_SYMBOL_GPL(iommu_group_add_device);
> 
> There's of course a race here, not sure what we can do about it though
> (if the device is removed before the notification is finalized). It
> might not even be worth bothering. I suppose we assume the caller holds
> a reference so the struct device itself won't go away until we have
> returned anyway, however I worry that the "client" might end up getting
> the remove notification before it gets the add :-)
> 
> Here too, something that we can sort out in a subsequent patch if worth
> it.
> 
> Or we can just say that it's up to the callers (platform, hotplug
> code, ...) to not call add and remove racily.

Yes, I was assuming the caller held a reference to the struct device to
prevent such a race, looks like I forgot to document that in the
comments.  I'll have to think about if we can fix the ordering problem.
We can re-order the list_add vs notification, but then we just risk
dropping the remove.  Perhaps we need to extend the lock or add another
to group {list add, notify add}, {list lookup, remove, notify remove}.
I'm not even sure this race is possible though w/ a device reference.

> > -static int iommu_device_notifier(struct notifier_block *nb,
> > -				 unsigned long action, void *data)
> > +/**
> > + * iommu_group_remove_device - remove a device from it's current group
> > + * @dev: device to be removed
> > + *
> > + * This function is called by an iommu driver to remove the device from
> > + * it's current group.  This decrements the iommu group reference count.
> > + */
> > +void iommu_group_remove_device(struct device *dev)
> > +{
> > +	struct iommu_group *group = dev->iommu_group;
> > +	struct iommu_device *tmp_device, *device = NULL;
> > +
> > +	/* Pre-notify listeners that a device is being removed. */
> > +	blocking_notifier_call_chain(&group->notifier,
> > +				     IOMMU_GROUP_NOTIFY_DEL_DEVICE, dev);
> > +
> > +	mutex_lock(&group->mutex);
> > +	list_for_each_entry(tmp_device, &group->devices, list) {
> > +		if (tmp_device->dev == dev) {
> > +			device = tmp_device;
> > +			list_del(&device->list);
> > +			break;
> > +		}
> > +	}
> > +	mutex_unlock(&group->mutex);
> > +
> > +	if (!device)
> > +		return;
> > +
> > +	sysfs_remove_link(group->devices_kobj, device->name);
> > +	sysfs_remove_link(&dev->kobj, "iommu_group");
> > +
> > +	kfree(device->name);
> > +	kfree(device);
> > +	dev->iommu_group = NULL;
> > +	kobject_put(group->devices_kobj);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_group_remove_device);
> > +
> > +/**
> > + * iommu_group_for_each_dev - iterate over each device in the group
> > + * @group: the group
> > + * @data: caller opaque data to be passed to callback function
> > + * @fn: caller supplied callback function
> > + *
> > + * This function is called by group users to iterate over group devices.
> > + * Callers should hold a reference count to the group during callback.
> > + * The group->mutex is held across callbacks, which will block calls to
> > + * iommu_group_add/remove_device.
> > + */
> > +int iommu_group_for_each_dev(struct iommu_group *group, void *data,
> > +			     int (*fn)(struct device *, void *))
> > +{
> > +	struct iommu_device *device;
> > +	int ret = 0;
> > +
> > +	mutex_lock(&group->mutex);
> > +	list_for_each_entry(device, &group->devices, list) {
> > +		ret = fn(device->dev, data);
> > +		if (ret)
> > +			break;
> > +	}
> > +	mutex_unlock(&group->mutex);
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_group_for_each_dev);
> > +
> > +/**
> > + * iommu_group_get - Return the group for a device and increment reference
> > + * @dev: get the group that this device belongs to
> > + *
> > + * This function is called by iommu drivers and users to get the group
> > + * for the specified device.  If found, the group is returned and the group
> > + * reference in incremented, else NULL.
> > + */
> > +struct iommu_group *iommu_group_get(struct device *dev)
> > +{
> > +	struct iommu_group *group = dev->iommu_group;
> > +
> > +	if (group)
> > +		kobject_get(group->devices_kobj);
> > +
> > +	return group;
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_group_get);
> > +
> > +/**
> > + * iommu_group_put - Decrement group reference
> > + * @group: the group to use
> > + *
> > + * This function is called by iommu drivers and users to release the
> > + * iommu group.  Once the reference count is zero, the group is released.
> > + */
> > +void iommu_group_put(struct iommu_group *group)
> > +{
> > +	if (group)
> > +		kobject_put(group->devices_kobj);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_group_put);
> > +
> > +/**
> > + * iommu_group_register_notifier - Register a notifier for group changes
> > + * @group: the group to watch
> > + * @nb: notifier block to signal
> > + *
> > + * This function allows iommu group users to track changes in a group.
> > + * See include/linux/iommu.h for actions sent via this notifier.  Caller
> > + * should hold a reference to the group throughout notifier registration.
> > + */
> > +int iommu_group_register_notifier(struct iommu_group *group,
> > +				  struct notifier_block *nb)
> > +{
> > +	return blocking_notifier_chain_register(&group->notifier, nb);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_group_register_notifier);
> > +
> > +/**
> > + * iommu_group_unregister_notifier - Unregister a notifier
> > + * @group: the group to watch
> > + * @nb: notifier block to signal
> > + *
> > + * Unregister a previously registered group notifier block.
> > + */
> > +int iommu_group_unregister_notifier(struct iommu_group *group,
> > +				    struct notifier_block *nb)
> > +{
> > +	return blocking_notifier_chain_unregister(&group->notifier, nb);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
> > +
> > +/**
> > + * iommu_group_id - Return ID for a group
> > + * @group: the group to ID
> > + *
> > + * Return the unique ID for the group matching the sysfs group number.
> > + */
> > +int iommu_group_id(struct iommu_group *group)
> > +{
> > +	return group->id;
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_group_id);
> > +
> > +static int add_iommu_group(struct device *dev, void *data)
> > +{
> > +	struct iommu_ops *ops = data;
> > +
> > +	if (!ops->add_device)
> > +		return -ENODEV;
> > +
> > +	WARN_ON(dev->iommu_group);
> > +
> > +	ops->add_device(dev);
> > +
> > +	return 0;
> > +}
> > +
> > +static int iommu_bus_notifier(struct notifier_block *nb,
> > +			      unsigned long action, void *data)
> >  {
> >  	struct device *dev = data;
> > +	struct iommu_ops *ops = dev->bus->iommu_ops;
> > +	struct iommu_group *group;
> > +	unsigned long group_action = 0;
> > +
> > +	/*
> > +	 * ADD/DEL call into iommu driver ops if provided, which may
> > +	 * result in ADD/DEL notifiers to group->notifier
> > +	 */
> > +	if (action == BUS_NOTIFY_ADD_DEVICE) {
> > +		if (ops->add_device)
> > +			return ops->add_device(dev);
> > +	} else if (action == BUS_NOTIFY_DEL_DEVICE) {
> > +		if (ops->remove_device && dev->iommu_group) {
> > +			ops->remove_device(dev);
> > +			return 0;
> > +		}
> > +	}
> >  
> > -	if (action == BUS_NOTIFY_ADD_DEVICE)
> > -		return add_iommu_group(dev, NULL);
> > -	else if (action == BUS_NOTIFY_DEL_DEVICE)
> > -		return remove_iommu_group(dev);
> > +	/*
> > +	 * Remaining BUS_NOTIFYs get filtered and republished to the
> > +	 * group, if anyone is listening
> > +	 */
> > +	group = iommu_group_get(dev);
> > +	if (!group)
> > +		return 0;
> >  
> > +	switch (action) {
> > +	case BUS_NOTIFY_BIND_DRIVER:
> > +		group_action = IOMMU_GROUP_NOTIFY_BIND_DRIVER;
> > +		break;
> > +	case BUS_NOTIFY_BOUND_DRIVER:
> > +		group_action = IOMMU_GROUP_NOTIFY_BOUND_DRIVER;
> > +		break;
> > +	case BUS_NOTIFY_UNBIND_DRIVER:
> > +		group_action = IOMMU_GROUP_NOTIFY_UNBIND_DRIVER;
> > +		break;
> > +	case BUS_NOTIFY_UNBOUND_DRIVER:
> > +		group_action = IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER;
> > +		break;
> > +	}
> > +
> > +	if (group_action)
> > +		blocking_notifier_call_chain(&group->notifier,
> > +					     group_action, dev);
> > +
> > +	iommu_group_put(group);
> >  	return 0;
> >  }
> >
> > -static struct notifier_block iommu_device_nb = {
> > -	.notifier_call = iommu_device_notifier,
> > +static struct notifier_block iommu_bus_nb = {
> > +	.notifier_call = iommu_bus_notifier,
> >  };
> >  
> >  static void iommu_bus_init(struct bus_type *bus, struct iommu_ops *ops)
> >  {
> > -	bus_register_notifier(bus, &iommu_device_nb);
> > -	bus_for_each_dev(bus, NULL, NULL, add_iommu_group);
> > +	bus_register_notifier(bus, &iommu_bus_nb);
> > +	bus_for_each_dev(bus, NULL, ops, add_iommu_group);
> >  }
> 
> So if I understand correctly this is a rework of a piece of
> infrastructure that powerpc doesn't use today, which uses the existing
> "iommu_ops" to automatically signal the iommu of added/removed devices,
> right ?

Right, and add_device/remove_device are optional in the struct
iommu_ops.  amd_iommu already has a bus notifier, so I don't try to
replace it with this.  intel-iommu creates iommu domain dynamically, so
it does use this to enumerate devices for iommu groups.

> Do we need to warn somewhere that the above code is racy vs. concurrent
> hotplug and thus might end up adding a device twice ? (It's up to
> iommu->add_device implementation to then ensure it doesn't mess up I
> assume).

Is it sufficient to test !dev->iommu_group before calling
iommu->add_device?  We already do this on the DEL path.  I can follow-up
with a patch for that.


> >  /**
> > @@ -192,6 +667,45 @@ void iommu_detach_device(struct iommu_domain *domain, struct device *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(iommu_detach_device);
> >  
> > +/*
> > + * IOMMU groups are really the natrual working unit of the IOMMU, but
> > + * the IOMMU API works on domains and devices.  Bridge that gap by
> > + * iterating over the devices in a group.  Ideally we'd have a single
> > + * device which represents the requestor ID of the group, but we also
> > + * allow IOMMU drivers to create policy defined minimum sets, where
> > + * the physical hardware may be able to distiguish members, but we
> > + * wish to group them at a higher level (ex. untrusted multi-function
> > + * PCI devices).  Thus we attach each device.
> > + */
> > +static int iommu_group_do_attach_device(struct device *dev, void *data)
> > +{
> > +	struct iommu_domain *domain = data;
> > +
> > +	return iommu_attach_device(domain, dev);
> > +}
> > +
> > +int iommu_attach_group(struct iommu_domain *domain, struct iommu_group *group)
> > +{
> > +	return iommu_group_for_each_dev(group, domain,
> > +					iommu_group_do_attach_device);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_attach_group);
> > +
> > +static int iommu_group_do_detach_device(struct device *dev, void *data)
> > +{
> > +	struct iommu_domain *domain = data;
> > +
> > +	iommu_detach_device(domain, dev);
> > +
> > +	return 0;
> > +}
> > +
> > +void iommu_detach_group(struct iommu_domain *domain, struct iommu_group *group)
> > +{
> > +	iommu_group_for_each_dev(group, domain, iommu_group_do_detach_device);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_detach_group);
> > +
> 
> So as you probably are aware by now, we have a 1:1 group/domain
> relationship on power (and don't implement the iommu API today) but I
> have no objection with the API, I'll have to check how Alexey hooked our
> code up (I haven't had a chance to look at it just yet).

Yes, I've tried to design it for both.  I expect your iommu driver to
reject adding a device to a domain where it doesn't belong and I think
this is how Alexey has coded it.  You really want that protection anyway
I think, so this just takes advantage of that failing.  Thanks for the
review!

Alex


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/7] iommu: IOMMU Groups
  2012-06-20 16:48     ` Alex Williamson
@ 2012-06-21  0:07       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 21+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-21  0:07 UTC (permalink / raw)
  To: Alex Williamson
  Cc: joerg.roedel, dwmw2, iommu, bhelgaas, aik, david, konrad.wilk,
	linux-kernel, linux-pci, gregkh, ddutile, liuj97

On Wed, 2012-06-20 at 10:48 -0600, Alex Williamson wrote:

> Yes, I was assuming the caller held a reference to the struct device to
> prevent such a race, looks like I forgot to document that in the
> comments.  I'll have to think about if we can fix the ordering problem.
> We can re-order the list_add vs notification, but then we just risk
> dropping the remove.  Perhaps we need to extend the lock or add another
> to group {list add, notify add}, {list lookup, remove, notify remove}.
> I'm not even sure this race is possible though w/ a device reference.

Or we put the burden on the callers not to racily add & remove,
including full completion of related notifiers. Might not even be hard
(ie might already be the case).

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/7] IOMMU: Groups support
  2012-05-30 20:18 [PATCH v2 0/7] IOMMU: Groups support Alex Williamson
                   ` (7 preceding siblings ...)
  2012-06-06 11:05 ` [PATCH v2 0/7] IOMMU: Groups support Joerg Roedel
@ 2012-06-25 11:29 ` Joerg Roedel
  8 siblings, 0 replies; 21+ messages in thread
From: Joerg Roedel @ 2012-06-25 11:29 UTC (permalink / raw)
  To: Alex Williamson
  Cc: dwmw2, iommu, bhelgaas, benh, aik, david, konrad.wilk,
	linux-kernel, linux-pci, gregkh, ddutile, liuj97

On Wed, May 30, 2012 at 02:18:29PM -0600, Alex Williamson wrote:
> Alex Williamson (7):
>       iommu: Remove group_mf
>       intel-iommu: Make use of DMA quirks and ACS checks in IOMMU groups
>       amd_iommu: Make use of DMA quirks and ACS checks in IOMMU groups
>       intel-iommu: Support IOMMU groups
>       amd_iommu: Support IOMMU groups
>       iommu: IOMMU Groups
>       driver core: Add iommu_group tracking to struct device
> 
> 
>  .../ABI/testing/sysfs-kernel-iommu_groups          |   14 
>  Documentation/kernel-parameters.txt                |    1 
>  arch/ia64/include/asm/iommu.h                      |    2 
>  arch/ia64/kernel/pci-dma.c                         |    1 
>  arch/x86/include/asm/iommu.h                       |    1 
>  arch/x86/kernel/pci-dma.c                          |   11 
>  drivers/iommu/amd_iommu.c                          |   70 ++
>  drivers/iommu/intel-iommu.c                        |   89 ++-
>  drivers/iommu/iommu.c                              |  578 +++++++++++++++++++-
>  include/linux/device.h                             |    2 
>  include/linux/iommu.h                              |  104 +++-
>  11 files changed, 767 insertions(+), 106 deletions(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-kernel-iommu_groups

Applied and pushed all of them. Thanks a lot, Alex.

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2012-06-25 11:29 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-30 20:18 [PATCH v2 0/7] IOMMU: Groups support Alex Williamson
2012-05-30 20:18 ` [PATCH v2 1/7] driver core: Add iommu_group tracking to struct device Alex Williamson
2012-06-20  5:45   ` Benjamin Herrenschmidt
2012-05-30 20:18 ` [PATCH v2 2/7] iommu: IOMMU Groups Alex Williamson
2012-06-20 10:01   ` Benjamin Herrenschmidt
2012-06-20 16:48     ` Alex Williamson
2012-06-21  0:07       ` Benjamin Herrenschmidt
2012-05-30 20:19 ` [PATCH v2 3/7] amd_iommu: Support IOMMU groups Alex Williamson
2012-05-30 20:19 ` [PATCH v2 4/7] intel-iommu: " Alex Williamson
2012-05-30 20:19 ` [PATCH v2 5/7] amd_iommu: Make use of DMA quirks and ACS checks in " Alex Williamson
2012-05-30 20:19 ` [PATCH v2 6/7] intel-iommu: " Alex Williamson
2012-05-31 19:47   ` Don Dutile
2012-05-31 20:41     ` Alex Williamson
2012-05-30 20:19 ` [PATCH v2 7/7] iommu: Remove group_mf Alex Williamson
2012-06-06 11:05 ` [PATCH v2 0/7] IOMMU: Groups support Joerg Roedel
2012-06-11 15:37   ` Alex Williamson
2012-06-12  2:55     ` Bjorn Helgaas
2012-06-18 17:31     ` Alex Williamson
2012-06-19  9:49       ` Joerg Roedel
2012-06-19 10:40         ` Benjamin Herrenschmidt
2012-06-25 11:29 ` Joerg Roedel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).