All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] IOMMU groups
@ 2012-04-02 21:14 ` Alex Williamson
  0 siblings, 0 replies; 21+ messages in thread
From: Alex Williamson @ 2012-04-02 21:14 UTC (permalink / raw)
  To: david, dwmw2
  Cc: alex.williamson, iommu, qemu-devel, joerg.roedel, kvm, linux-kernel

This series attempts to make IOMMU device grouping a slightly more
integral part of the device model.  iommu_device_groups were originally
introduced to support the VFIO user space driver interface which needs
to understand the granularity of device isolation in order to ensure
security of devices when assigned for user access.  This information
was provided via a simple group identifier from the IOMMU driver allowing
VFIO to walk devices and assemble groups itself.

The feedback received from this was that groups should be the effective
unit of work for the IOMMU API.  The existing model of allowing domains
to be created and individual devices attached ignores many of the
restrictions of the IOMMU, whether by design, by topology or by defective
devices.  Additionally we should be able to use the grouping information
at the dma ops layer for managing domains and quirking devices.

This series is a sketch at implementing only those aspects and leaving
everything else about the multifaceted hairball of Isolation groups for
another API.  Please comment and let me know if this seems like the
direction we should be headed.  Thanks,

Alex


---

Alex Williamson (3):
      iommu: Create attach/detach group interface
      iommu: Create basic group infrastructure and update AMD-Vi & Intel VT-d
      iommu: Introduce iommu_group


 drivers/iommu/amd_iommu.c   |   50 ++++++----
 drivers/iommu/intel-iommu.c |   76 ++++++++--------
 drivers/iommu/iommu.c       |  210 ++++++++++++++++++++++++++++++++++++++-----
 include/linux/device.h      |    2 
 include/linux/iommu.h       |   43 +++++++++
 5 files changed, 301 insertions(+), 80 deletions(-)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH 0/3] IOMMU groups
@ 2012-04-02 21:14 ` Alex Williamson
  0 siblings, 0 replies; 21+ messages in thread
From: Alex Williamson @ 2012-04-02 21:14 UTC (permalink / raw)
  To: david-xT8FGy+AXnRB3Ne2BGzF6laj5H9X9Tb+, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A

This series attempts to make IOMMU device grouping a slightly more
integral part of the device model.  iommu_device_groups were originally
introduced to support the VFIO user space driver interface which needs
to understand the granularity of device isolation in order to ensure
security of devices when assigned for user access.  This information
was provided via a simple group identifier from the IOMMU driver allowing
VFIO to walk devices and assemble groups itself.

The feedback received from this was that groups should be the effective
unit of work for the IOMMU API.  The existing model of allowing domains
to be created and individual devices attached ignores many of the
restrictions of the IOMMU, whether by design, by topology or by defective
devices.  Additionally we should be able to use the grouping information
at the dma ops layer for managing domains and quirking devices.

This series is a sketch at implementing only those aspects and leaving
everything else about the multifaceted hairball of Isolation groups for
another API.  Please comment and let me know if this seems like the
direction we should be headed.  Thanks,

Alex


---

Alex Williamson (3):
      iommu: Create attach/detach group interface
      iommu: Create basic group infrastructure and update AMD-Vi & Intel VT-d
      iommu: Introduce iommu_group


 drivers/iommu/amd_iommu.c   |   50 ++++++----
 drivers/iommu/intel-iommu.c |   76 ++++++++--------
 drivers/iommu/iommu.c       |  210 ++++++++++++++++++++++++++++++++++++++-----
 include/linux/device.h      |    2 
 include/linux/iommu.h       |   43 +++++++++
 5 files changed, 301 insertions(+), 80 deletions(-)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] [RFC PATCH 0/3] IOMMU groups
@ 2012-04-02 21:14 ` Alex Williamson
  0 siblings, 0 replies; 21+ messages in thread
From: Alex Williamson @ 2012-04-02 21:14 UTC (permalink / raw)
  To: david, dwmw2
  Cc: kvm, joerg.roedel, linux-kernel, iommu, qemu-devel, alex.williamson

This series attempts to make IOMMU device grouping a slightly more
integral part of the device model.  iommu_device_groups were originally
introduced to support the VFIO user space driver interface which needs
to understand the granularity of device isolation in order to ensure
security of devices when assigned for user access.  This information
was provided via a simple group identifier from the IOMMU driver allowing
VFIO to walk devices and assemble groups itself.

The feedback received from this was that groups should be the effective
unit of work for the IOMMU API.  The existing model of allowing domains
to be created and individual devices attached ignores many of the
restrictions of the IOMMU, whether by design, by topology or by defective
devices.  Additionally we should be able to use the grouping information
at the dma ops layer for managing domains and quirking devices.

This series is a sketch at implementing only those aspects and leaving
everything else about the multifaceted hairball of Isolation groups for
another API.  Please comment and let me know if this seems like the
direction we should be headed.  Thanks,

Alex


---

Alex Williamson (3):
      iommu: Create attach/detach group interface
      iommu: Create basic group infrastructure and update AMD-Vi & Intel VT-d
      iommu: Introduce iommu_group


 drivers/iommu/amd_iommu.c   |   50 ++++++----
 drivers/iommu/intel-iommu.c |   76 ++++++++--------
 drivers/iommu/iommu.c       |  210 ++++++++++++++++++++++++++++++++++++++-----
 include/linux/device.h      |    2 
 include/linux/iommu.h       |   43 +++++++++
 5 files changed, 301 insertions(+), 80 deletions(-)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH 1/3] iommu: Introduce iommu_group
@ 2012-04-02 21:14   ` Alex Williamson
  0 siblings, 0 replies; 21+ messages in thread
From: Alex Williamson @ 2012-04-02 21:14 UTC (permalink / raw)
  To: david, dwmw2
  Cc: alex.williamson, iommu, qemu-devel, joerg.roedel, kvm, linux-kernel

IOMMUs often do not have visibility of individual devices in the
system.  Due to IOMMU design, bus topology, or device quirks, we
can often only identify groups of devices.  Examples include
Intel VT-d & AMD-Vi which often have function level visibility
compared to POWER partitionable endpoints which have bridge level
granularity.  PCIe-to-PCI bridges also often cloud the IOMMU
visibility as it cannot distiguish devices behind the bridge.
Devices can also sometimes hurt themselves by initiating DMA using
the wrong source ID on a multifunction PCI device.

IOMMU groups are meant to help solve these problems and hopefully
become the working unit of the IOMMI API.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 include/linux/device.h |    2 ++
 include/linux/iommu.h  |    5 +++++
 2 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index b63fb39..6acab1c 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -35,6 +35,7 @@ struct subsys_private;
 struct bus_type;
 struct device_node;
 struct iommu_ops;
+struct iommu_group;
 
 struct bus_attribute {
 	struct attribute	attr;
@@ -683,6 +684,7 @@ struct device {
 	const struct attribute_group **groups;	/* optional groups */
 
 	void	(*release)(struct device *dev);
+	struct iommu_group	*iommu_group;
 };
 
 /* Get the wakeup routines, which depend on struct device */
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d937580..2ee375c 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -26,6 +26,7 @@
 #define IOMMU_CACHE	(4) /* DMA cache coherency */
 
 struct iommu_ops;
+struct iommu_group;
 struct bus_type;
 struct device;
 struct iommu_domain;
@@ -78,6 +79,9 @@ struct iommu_ops {
 	unsigned long pgsize_bitmap;
 };
 
+struct iommu_group {
+};
+
 extern int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops);
 extern bool iommu_present(struct bus_type *bus);
 extern struct iommu_domain *iommu_domain_alloc(struct bus_type *bus);
@@ -140,6 +144,7 @@ static inline int report_iommu_fault(struct iommu_domain *domain,
 #else /* CONFIG_IOMMU_API */
 
 struct iommu_ops {};
+struct iommu_group {};
 
 static inline bool iommu_present(struct bus_type *bus)
 {


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 1/3] iommu: Introduce iommu_group
@ 2012-04-02 21:14   ` Alex Williamson
  0 siblings, 0 replies; 21+ messages in thread
From: Alex Williamson @ 2012-04-02 21:14 UTC (permalink / raw)
  To: david-xT8FGy+AXnRB3Ne2BGzF6laj5H9X9Tb+, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A

IOMMUs often do not have visibility of individual devices in the
system.  Due to IOMMU design, bus topology, or device quirks, we
can often only identify groups of devices.  Examples include
Intel VT-d & AMD-Vi which often have function level visibility
compared to POWER partitionable endpoints which have bridge level
granularity.  PCIe-to-PCI bridges also often cloud the IOMMU
visibility as it cannot distiguish devices behind the bridge.
Devices can also sometimes hurt themselves by initiating DMA using
the wrong source ID on a multifunction PCI device.

IOMMU groups are meant to help solve these problems and hopefully
become the working unit of the IOMMI API.

Signed-off-by: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---

 include/linux/device.h |    2 ++
 include/linux/iommu.h  |    5 +++++
 2 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index b63fb39..6acab1c 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -35,6 +35,7 @@ struct subsys_private;
 struct bus_type;
 struct device_node;
 struct iommu_ops;
+struct iommu_group;
 
 struct bus_attribute {
 	struct attribute	attr;
@@ -683,6 +684,7 @@ struct device {
 	const struct attribute_group **groups;	/* optional groups */
 
 	void	(*release)(struct device *dev);
+	struct iommu_group	*iommu_group;
 };
 
 /* Get the wakeup routines, which depend on struct device */
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d937580..2ee375c 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -26,6 +26,7 @@
 #define IOMMU_CACHE	(4) /* DMA cache coherency */
 
 struct iommu_ops;
+struct iommu_group;
 struct bus_type;
 struct device;
 struct iommu_domain;
@@ -78,6 +79,9 @@ struct iommu_ops {
 	unsigned long pgsize_bitmap;
 };
 
+struct iommu_group {
+};
+
 extern int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops);
 extern bool iommu_present(struct bus_type *bus);
 extern struct iommu_domain *iommu_domain_alloc(struct bus_type *bus);
@@ -140,6 +144,7 @@ static inline int report_iommu_fault(struct iommu_domain *domain,
 #else /* CONFIG_IOMMU_API */
 
 struct iommu_ops {};
+struct iommu_group {};
 
 static inline bool iommu_present(struct bus_type *bus)
 {

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [RFC PATCH 1/3] iommu: Introduce iommu_group
@ 2012-04-02 21:14   ` Alex Williamson
  0 siblings, 0 replies; 21+ messages in thread
From: Alex Williamson @ 2012-04-02 21:14 UTC (permalink / raw)
  To: david, dwmw2
  Cc: kvm, joerg.roedel, linux-kernel, iommu, qemu-devel, alex.williamson

IOMMUs often do not have visibility of individual devices in the
system.  Due to IOMMU design, bus topology, or device quirks, we
can often only identify groups of devices.  Examples include
Intel VT-d & AMD-Vi which often have function level visibility
compared to POWER partitionable endpoints which have bridge level
granularity.  PCIe-to-PCI bridges also often cloud the IOMMU
visibility as it cannot distiguish devices behind the bridge.
Devices can also sometimes hurt themselves by initiating DMA using
the wrong source ID on a multifunction PCI device.

IOMMU groups are meant to help solve these problems and hopefully
become the working unit of the IOMMI API.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 include/linux/device.h |    2 ++
 include/linux/iommu.h  |    5 +++++
 2 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index b63fb39..6acab1c 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -35,6 +35,7 @@ struct subsys_private;
 struct bus_type;
 struct device_node;
 struct iommu_ops;
+struct iommu_group;
 
 struct bus_attribute {
 	struct attribute	attr;
@@ -683,6 +684,7 @@ struct device {
 	const struct attribute_group **groups;	/* optional groups */
 
 	void	(*release)(struct device *dev);
+	struct iommu_group	*iommu_group;
 };
 
 /* Get the wakeup routines, which depend on struct device */
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d937580..2ee375c 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -26,6 +26,7 @@
 #define IOMMU_CACHE	(4) /* DMA cache coherency */
 
 struct iommu_ops;
+struct iommu_group;
 struct bus_type;
 struct device;
 struct iommu_domain;
@@ -78,6 +79,9 @@ struct iommu_ops {
 	unsigned long pgsize_bitmap;
 };
 
+struct iommu_group {
+};
+
 extern int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops);
 extern bool iommu_present(struct bus_type *bus);
 extern struct iommu_domain *iommu_domain_alloc(struct bus_type *bus);
@@ -140,6 +144,7 @@ static inline int report_iommu_fault(struct iommu_domain *domain,
 #else /* CONFIG_IOMMU_API */
 
 struct iommu_ops {};
+struct iommu_group {};
 
 static inline bool iommu_present(struct bus_type *bus)
 {

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 2/3] iommu: Create basic group infrastructure and update AMD-Vi & Intel VT-d
@ 2012-04-02 21:14   ` Alex Williamson
  0 siblings, 0 replies; 21+ messages in thread
From: Alex Williamson @ 2012-04-02 21:14 UTC (permalink / raw)
  To: david, dwmw2
  Cc: alex.williamson, iommu, qemu-devel, joerg.roedel, kvm, linux-kernel

IOMMU groups define the minimum granularity of the IOMMU.  We therefore
create groups using a dma_dev which is the effective requestor ID for
the entire group.  Additional devices can be added to groups based on
system topology, IOMMU limitations, or device quirks.

This implementation also includes a simple idr database for creating
a flat address space of group IDs across multiple IOMMUs.  Updates
included for Intel VT-d, using example iommu callbacks for adding and
removing devices, as well as AMD-Vi, which tracks devices on it's own.
We should be able to better integrate the iommu_group within existing
AMD-Vi structs or provide a private data location within the iommu_group
where we can eventually reduce redundancy.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 drivers/iommu/amd_iommu.c   |   50 ++++++-----
 drivers/iommu/intel-iommu.c |   76 +++++++++--------
 drivers/iommu/iommu.c       |  198 ++++++++++++++++++++++++++++++++++++++-----
 include/linux/iommu.h       |   23 +++++
 4 files changed, 267 insertions(+), 80 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index f75e060..876db28 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -256,9 +256,10 @@ static bool check_device(struct device *dev)
 
 static int iommu_init_device(struct device *dev)
 {
-	struct pci_dev *pdev = to_pci_dev(dev);
+	struct pci_dev *dma_pdev, *pdev = to_pci_dev(dev);
 	struct iommu_dev_data *dev_data;
 	u16 alias;
+	int ret;
 
 	if (dev->archdata.iommu)
 		return 0;
@@ -279,6 +280,26 @@ static int iommu_init_device(struct device *dev)
 			return -ENOTSUPP;
 		}
 		dev_data->alias_data = alias_data;
+
+		dma_pdev = pci_get_bus_and_slot(alias >> 8, alias & 0xff);
+	} else
+		dma_pdev = pdev;
+
+	/* dma_pdev = iommu_pci_quirk(dma_pdev) */
+
+	if (!dma_pdev->dev.iommu_group) {
+		struct iommu_group *group;
+
+		group = iommu_group_alloc(&dma_pdev->dev);
+		if (IS_ERR(group))
+			return PTR_ERR(group);
+	}
+
+	ret = iommu_group_add_device(dma_pdev->dev.iommu_group, dev);
+	if (ret) {
+		if (iommu_group_empty(dma_pdev->dev.iommu_group))
+			iommu_group_free(dma_pdev->dev.iommu_group);
+		return ret;
 	}
 
 	if (pci_iommuv2_capable(pdev)) {
@@ -309,6 +330,12 @@ static void iommu_ignore_device(struct device *dev)
 
 static void iommu_uninit_device(struct device *dev)
 {
+	struct iommu_group *group = dev->iommu_group;
+
+	iommu_group_remove_device(dev);
+	if (iommu_group_empty(group))
+		iommu_group_free(group);
+
 	/*
 	 * Nothing to do here - we keep dev_data around for unplugged devices
 	 * and reuse it when the device is re-plugged - not doing so would
@@ -3191,26 +3218,6 @@ static int amd_iommu_domain_has_cap(struct iommu_domain *domain,
 	return 0;
 }
 
-static int amd_iommu_device_group(struct device *dev, unsigned int *groupid)
-{
-	struct iommu_dev_data *dev_data = dev->archdata.iommu;
-	struct pci_dev *pdev = to_pci_dev(dev);
-	u16 devid;
-
-	if (!dev_data)
-		return -ENODEV;
-
-	if (pdev->is_virtfn || !iommu_group_mf)
-		devid = dev_data->devid;
-	else
-		devid = calc_devid(pdev->bus->number,
-				   PCI_DEVFN(PCI_SLOT(pdev->devfn), 0));
-
-	*groupid = amd_iommu_alias_table[devid];
-
-	return 0;
-}
-
 static struct iommu_ops amd_iommu_ops = {
 	.domain_init = amd_iommu_domain_init,
 	.domain_destroy = amd_iommu_domain_destroy,
@@ -3220,7 +3227,6 @@ static struct iommu_ops amd_iommu_ops = {
 	.unmap = amd_iommu_unmap,
 	.iova_to_phys = amd_iommu_iova_to_phys,
 	.domain_has_cap = amd_iommu_domain_has_cap,
-	.device_group = amd_iommu_device_group,
 	.pgsize_bitmap	= AMD_IOMMU_PGSIZES,
 };
 
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index c9c6053..41ab7d0 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4075,54 +4075,59 @@ static int intel_iommu_domain_has_cap(struct iommu_domain *domain,
 	return 0;
 }
 
-/*
- * Group numbers are arbitrary.  Device with the same group number
- * indicate the iommu cannot differentiate between them.  To avoid
- * tracking used groups we just use the seg|bus|devfn of the lowest
- * level we're able to differentiate devices
- */
-static int intel_iommu_device_group(struct device *dev, unsigned int *groupid)
+static int intel_iommu_add_device(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
-	struct pci_dev *bridge;
-	union {
-		struct {
-			u8 devfn;
-			u8 bus;
-			u16 segment;
-		} pci;
-		u32 group;
-	} id;
-
-	if (iommu_no_mapping(dev))
-		return -ENODEV;
-
-	id.pci.segment = pci_domain_nr(pdev->bus);
-	id.pci.bus = pdev->bus->number;
-	id.pci.devfn = pdev->devfn;
+	struct pci_dev *bridge, *dma_pdev = pdev;
+	int ret;
 
-	if (!device_to_iommu(id.pci.segment, id.pci.bus, id.pci.devfn))
+	if (!device_to_iommu(pci_domain_nr(pdev->bus),
+			     pdev->bus->number, pdev->devfn))
 		return -ENODEV;
 
 	bridge = pci_find_upstream_pcie_bridge(pdev);
 	if (bridge) {
-		if (pci_is_pcie(bridge)) {
-			id.pci.bus = bridge->subordinate->number;
-			id.pci.devfn = 0;
-		} else {
-			id.pci.bus = bridge->bus->number;
-			id.pci.devfn = bridge->devfn;
-		}
+		if (pci_is_pcie(bridge))
+			dma_pdev = pci_get_domain_bus_and_slot(
+						pci_domain_nr(pdev->bus),
+						bridge->subordinate->number, 0);
+		else
+			dma_pdev = bridge;
+	}
+
+	if (!bridge && !pdev->is_virtfn && iommu_group_mf)
+		dma_pdev = pci_get_slot(dma_pdev->bus, 0);
+
+	/* dma_pdev = iommu_pci_quirk(dma_pdev); */
+
+	if (!dma_pdev->dev.iommu_group) {
+		struct iommu_group *group;
+
+		group = iommu_group_alloc(&dma_pdev->dev);
+		if (IS_ERR(group))
+			return PTR_ERR(group);
 	}
 
-	if (!pdev->is_virtfn && iommu_group_mf)
-		id.pci.devfn = PCI_DEVFN(PCI_SLOT(id.pci.devfn), 0);
+	ret = iommu_group_add_device(dma_pdev->dev.iommu_group, dev);
+	if (ret) {
+		if (iommu_group_empty(dma_pdev->dev.iommu_group))
+			iommu_group_free(dma_pdev->dev.iommu_group);
 
-	*groupid = id.group;
+		return ret;
+	}
 
 	return 0;
 }
 
+static void intel_iommu_remove_device(struct device *dev)
+{
+	struct iommu_group *group = dev->iommu_group;
+
+	iommu_group_remove_device(dev);
+	if (iommu_group_empty(group))
+		iommu_group_free(group);
+}
+
 static struct iommu_ops intel_iommu_ops = {
 	.domain_init	= intel_iommu_domain_init,
 	.domain_destroy = intel_iommu_domain_destroy,
@@ -4132,7 +4137,8 @@ static struct iommu_ops intel_iommu_ops = {
 	.unmap		= intel_iommu_unmap,
 	.iova_to_phys	= intel_iommu_iova_to_phys,
 	.domain_has_cap = intel_iommu_domain_has_cap,
-	.device_group	= intel_iommu_device_group,
+	.add_device	= intel_iommu_add_device,
+	.remove_device	= intel_iommu_remove_device,
 	.pgsize_bitmap	= INTEL_IOMMU_PGSIZES,
 };
 
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 2198b2d..6a51ed9 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -26,48 +26,96 @@
 #include <linux/slab.h>
 #include <linux/errno.h>
 #include <linux/iommu.h>
+#include <linux/idr.h>
 
-static ssize_t show_iommu_group(struct device *dev,
-				struct device_attribute *attr, char *buf)
+static struct kset *iommu_group_kset;
+static struct idr iommu_group_idr;
+static struct mutex iommu_group_mutex;
+
+struct iommu_group_attribute {
+	struct attribute attr;
+	ssize_t (*show)(struct iommu_group *group, char *buf);
+	ssize_t (*store)(struct iommu_group *group,
+			 const char *buf, size_t count);
+};
+
+#define to_iommu_group_attr(_attr)	\
+	container_of(_attr, struct iommu_group_attribute, attr)
+#define to_iommu_group(_kobj)		\
+	container_of(_kobj, struct iommu_group, kobj)
+
+static ssize_t iommu_group_attr_show(struct kobject *kobj,
+				     struct attribute *__attr, char *buf)
 {
-	unsigned int groupid;
+	struct iommu_group_attribute *attr = to_iommu_group_attr(__attr);
+	struct iommu_group *group = to_iommu_group(kobj);
+	ssize_t ret = -EIO;
 
-	if (iommu_device_group(dev, &groupid))
-		return 0;
+	if (attr->show)
+		ret = attr->show(group, buf);
+	return ret;
+}
+
+static ssize_t iommu_group_attr_store(struct kobject *kobj,
+				      struct attribute *__attr,
+				      const char *buf, size_t count)
+{
+	struct iommu_group_attribute *attr = to_iommu_group_attr(__attr);
+	struct iommu_group *group = to_iommu_group(kobj);
+	ssize_t ret = -EIO;
 
-	return sprintf(buf, "%u", groupid);
+	if (attr->store)
+		ret = attr->store(group, buf, count);
+	return ret;
 }
-static DEVICE_ATTR(iommu_group, S_IRUGO, show_iommu_group, NULL);
 
-static int add_iommu_group(struct device *dev, void *data)
+static void iommu_group_release(struct kobject *kobj)
 {
-	unsigned int groupid;
+	struct iommu_group *group = to_iommu_group(kobj);
 
-	if (iommu_device_group(dev, &groupid) == 0)
-		return device_create_file(dev, &dev_attr_iommu_group);
+	mutex_lock(&iommu_group_mutex);
+	idr_remove(&iommu_group_idr, group->id);
+	mutex_unlock(&iommu_group_mutex);
 
-	return 0;
+	kfree(group);
 }
 
-static int remove_iommu_group(struct device *dev)
+static const struct sysfs_ops iommu_group_sysfs_ops = {
+	.show = iommu_group_attr_show,
+	.store = iommu_group_attr_store,
+};
+
+static struct kobj_type iommu_group_ktype = {
+	.sysfs_ops = &iommu_group_sysfs_ops,
+	.release = iommu_group_release,
+};
+
+static int add_iommu_group(struct device *dev, void *data)
 {
-	unsigned int groupid;
+	struct iommu_ops *ops = data;
 
-	if (iommu_device_group(dev, &groupid) == 0)
-		device_remove_file(dev, &dev_attr_iommu_group);
+	if (!ops->add_device)
+		return -ENODEV;
 
-	return 0;
+	WARN_ON(dev->iommu_group);
+
+	return ops->add_device(dev);
 }
 
 static int iommu_device_notifier(struct notifier_block *nb,
 				 unsigned long action, void *data)
 {
 	struct device *dev = data;
+	struct iommu_ops *ops = dev->bus->iommu_ops;
 
 	if (action == BUS_NOTIFY_ADD_DEVICE)
-		return add_iommu_group(dev, NULL);
-	else if (action == BUS_NOTIFY_DEL_DEVICE)
-		return remove_iommu_group(dev);
+		return add_iommu_group(dev, ops);
+	else if (action == BUS_NOTIFY_DEL_DEVICE) {
+		if (ops->remove_device && dev->iommu_group) {
+			ops->remove_device(dev);
+			return 0;
+		}
+	}
 
 	return 0;
 }
@@ -79,7 +127,97 @@ static struct notifier_block iommu_device_nb = {
 static void iommu_bus_init(struct bus_type *bus, struct iommu_ops *ops)
 {
 	bus_register_notifier(bus, &iommu_device_nb);
-	bus_for_each_dev(bus, NULL, NULL, add_iommu_group);
+	bus_for_each_dev(bus, NULL, ops, add_iommu_group);
+}
+
+struct iommu_group *iommu_group_alloc(struct device *dma_dev)
+{
+	struct iommu_group *group;
+	int ret;
+
+	group = kzalloc(sizeof(*group), GFP_KERNEL);
+	if (!group)
+		return ERR_PTR(-ENOMEM);
+
+	group->kobj.kset = iommu_group_kset;
+	group->dma_dev = dma_dev;
+	atomic_set(&group->refcnt, 0);
+
+	mutex_lock(&iommu_group_mutex);
+
+again:
+	if (unlikely(0 == idr_pre_get(&iommu_group_idr, GFP_KERNEL))) {
+		kfree(group);
+		mutex_unlock(&iommu_group_mutex);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	if (-EAGAIN == idr_get_new(&iommu_group_idr, group, &group->id))
+		goto again;
+
+	mutex_unlock(&iommu_group_mutex);
+
+	ret = kobject_init_and_add(&group->kobj, &iommu_group_ktype,
+				   NULL, "%d", group->id);
+	if (ret) {
+		iommu_group_release(&group->kobj);
+		return ERR_PTR(ret);
+	}
+
+	group->devices_kobj = kobject_create_and_add("devices", &group->kobj);
+        if (!group->devices_kobj) {
+		kobject_put(&group->kobj);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	dma_dev->iommu_group = group;
+
+	return group;
+}
+
+void iommu_group_free(struct iommu_group *group)
+{
+	group->dma_dev->iommu_group = NULL;
+	WARN_ON(atomic_read(&group->refcnt));
+	kobject_put(group->devices_kobj);
+	kobject_put(&group->kobj);
+}
+
+bool iommu_group_empty(struct iommu_group *group)
+{
+	return (0 == atomic_read(&group->refcnt));
+}
+
+int iommu_group_add_device(struct iommu_group *group, struct device *dev)
+{
+	int ret;
+
+	atomic_inc(&group->refcnt);
+
+	ret = sysfs_create_link(&dev->kobj, &group->kobj, "iommu_group");
+	if (ret)
+		return ret;
+
+	ret = sysfs_create_link(group->devices_kobj, &dev->kobj,
+                                kobject_name(&dev->kobj));
+	if (ret) {
+		sysfs_remove_link(&dev->kobj, "iommu_group");
+		return ret;
+	}
+
+	dev->iommu_group = group;
+
+	return 0;
+}
+
+void iommu_group_remove_device(struct device *dev)
+{
+	sysfs_remove_link(dev->iommu_group->devices_kobj,
+			  kobject_name(&dev->kobj));
+	sysfs_remove_link(&dev->kobj, "iommu_group");
+
+	atomic_dec(&dev->iommu_group->refcnt);
+	dev->iommu_group = NULL;
 }
 
 /**
@@ -335,9 +473,23 @@ EXPORT_SYMBOL_GPL(iommu_unmap);
 
 int iommu_device_group(struct device *dev, unsigned int *groupid)
 {
-	if (iommu_present(dev->bus) && dev->bus->iommu_ops->device_group)
-		return dev->bus->iommu_ops->device_group(dev, groupid);
+	if (dev->iommu_group) {
+		*groupid = dev->iommu_group->id;
+		return 0;
+	}
 
 	return -ENODEV;
 }
 EXPORT_SYMBOL_GPL(iommu_device_group);
+
+static int __init iommu_init(void)
+{
+	iommu_group_kset = kset_create_and_add("iommu_groups", NULL, NULL);
+	idr_init(&iommu_group_idr);
+	mutex_init(&iommu_group_mutex);
+
+	BUG_ON(!iommu_group_kset);
+
+	return 0;
+}
+subsys_initcall(iommu_init);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 2ee375c..24004d6 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -60,6 +60,8 @@ struct iommu_domain {
  * @iova_to_phys: translate iova to physical address
  * @domain_has_cap: domain capabilities query
  * @commit: commit iommu domain
+ * @add_device: add device to iommu grouping
+ * @remove_device: remove device from iommu grouping
  * @pgsize_bitmap: bitmap of supported page sizes
  */
 struct iommu_ops {
@@ -76,10 +78,25 @@ struct iommu_ops {
 	int (*domain_has_cap)(struct iommu_domain *domain,
 			      unsigned long cap);
 	int (*device_group)(struct device *dev, unsigned int *groupid);
+	int (*add_device)(struct device *dev);
+	void (*remove_device)(struct device *dev);
 	unsigned long pgsize_bitmap;
 };
 
+/**
+ * struct iommu_group - groups of devices representing iommu visibility
+ * @dma_dev: all dma from the group appears to the iommu using this source id
+ * @kobj: iommu group node in sysfs
+ * @devices_kobj: sysfs subdir node for linking devices
+ * @refcnt: number of devices in group
+ * @id: unique id number for the group (for easy sysfs listing)
+ */
 struct iommu_group {
+	struct device *dma_dev;
+	struct kobject kobj;
+	struct kobject *devices_kobj;
+	atomic_t refcnt;
+	int id;
 };
 
 extern int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops);
@@ -101,6 +118,12 @@ extern int iommu_domain_has_cap(struct iommu_domain *domain,
 extern void iommu_set_fault_handler(struct iommu_domain *domain,
 					iommu_fault_handler_t handler);
 extern int iommu_device_group(struct device *dev, unsigned int *groupid);
+extern struct iommu_group *iommu_group_alloc(struct device *dev);
+extern void iommu_group_free(struct iommu_group *group);
+extern bool iommu_group_empty(struct iommu_group *group);
+extern int iommu_group_add_device(struct iommu_group *group,
+				  struct device *dev);
+extern void iommu_group_remove_device(struct device *dev);
 
 /**
  * report_iommu_fault() - report about an IOMMU fault to the IOMMU framework


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 2/3] iommu: Create basic group infrastructure and update AMD-Vi & Intel VT-d
@ 2012-04-02 21:14   ` Alex Williamson
  0 siblings, 0 replies; 21+ messages in thread
From: Alex Williamson @ 2012-04-02 21:14 UTC (permalink / raw)
  To: david-xT8FGy+AXnRB3Ne2BGzF6laj5H9X9Tb+, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A

IOMMU groups define the minimum granularity of the IOMMU.  We therefore
create groups using a dma_dev which is the effective requestor ID for
the entire group.  Additional devices can be added to groups based on
system topology, IOMMU limitations, or device quirks.

This implementation also includes a simple idr database for creating
a flat address space of group IDs across multiple IOMMUs.  Updates
included for Intel VT-d, using example iommu callbacks for adding and
removing devices, as well as AMD-Vi, which tracks devices on it's own.
We should be able to better integrate the iommu_group within existing
AMD-Vi structs or provide a private data location within the iommu_group
where we can eventually reduce redundancy.

Signed-off-by: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---

 drivers/iommu/amd_iommu.c   |   50 ++++++-----
 drivers/iommu/intel-iommu.c |   76 +++++++++--------
 drivers/iommu/iommu.c       |  198 ++++++++++++++++++++++++++++++++++++++-----
 include/linux/iommu.h       |   23 +++++
 4 files changed, 267 insertions(+), 80 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index f75e060..876db28 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -256,9 +256,10 @@ static bool check_device(struct device *dev)
 
 static int iommu_init_device(struct device *dev)
 {
-	struct pci_dev *pdev = to_pci_dev(dev);
+	struct pci_dev *dma_pdev, *pdev = to_pci_dev(dev);
 	struct iommu_dev_data *dev_data;
 	u16 alias;
+	int ret;
 
 	if (dev->archdata.iommu)
 		return 0;
@@ -279,6 +280,26 @@ static int iommu_init_device(struct device *dev)
 			return -ENOTSUPP;
 		}
 		dev_data->alias_data = alias_data;
+
+		dma_pdev = pci_get_bus_and_slot(alias >> 8, alias & 0xff);
+	} else
+		dma_pdev = pdev;
+
+	/* dma_pdev = iommu_pci_quirk(dma_pdev) */
+
+	if (!dma_pdev->dev.iommu_group) {
+		struct iommu_group *group;
+
+		group = iommu_group_alloc(&dma_pdev->dev);
+		if (IS_ERR(group))
+			return PTR_ERR(group);
+	}
+
+	ret = iommu_group_add_device(dma_pdev->dev.iommu_group, dev);
+	if (ret) {
+		if (iommu_group_empty(dma_pdev->dev.iommu_group))
+			iommu_group_free(dma_pdev->dev.iommu_group);
+		return ret;
 	}
 
 	if (pci_iommuv2_capable(pdev)) {
@@ -309,6 +330,12 @@ static void iommu_ignore_device(struct device *dev)
 
 static void iommu_uninit_device(struct device *dev)
 {
+	struct iommu_group *group = dev->iommu_group;
+
+	iommu_group_remove_device(dev);
+	if (iommu_group_empty(group))
+		iommu_group_free(group);
+
 	/*
 	 * Nothing to do here - we keep dev_data around for unplugged devices
 	 * and reuse it when the device is re-plugged - not doing so would
@@ -3191,26 +3218,6 @@ static int amd_iommu_domain_has_cap(struct iommu_domain *domain,
 	return 0;
 }
 
-static int amd_iommu_device_group(struct device *dev, unsigned int *groupid)
-{
-	struct iommu_dev_data *dev_data = dev->archdata.iommu;
-	struct pci_dev *pdev = to_pci_dev(dev);
-	u16 devid;
-
-	if (!dev_data)
-		return -ENODEV;
-
-	if (pdev->is_virtfn || !iommu_group_mf)
-		devid = dev_data->devid;
-	else
-		devid = calc_devid(pdev->bus->number,
-				   PCI_DEVFN(PCI_SLOT(pdev->devfn), 0));
-
-	*groupid = amd_iommu_alias_table[devid];
-
-	return 0;
-}
-
 static struct iommu_ops amd_iommu_ops = {
 	.domain_init = amd_iommu_domain_init,
 	.domain_destroy = amd_iommu_domain_destroy,
@@ -3220,7 +3227,6 @@ static struct iommu_ops amd_iommu_ops = {
 	.unmap = amd_iommu_unmap,
 	.iova_to_phys = amd_iommu_iova_to_phys,
 	.domain_has_cap = amd_iommu_domain_has_cap,
-	.device_group = amd_iommu_device_group,
 	.pgsize_bitmap	= AMD_IOMMU_PGSIZES,
 };
 
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index c9c6053..41ab7d0 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4075,54 +4075,59 @@ static int intel_iommu_domain_has_cap(struct iommu_domain *domain,
 	return 0;
 }
 
-/*
- * Group numbers are arbitrary.  Device with the same group number
- * indicate the iommu cannot differentiate between them.  To avoid
- * tracking used groups we just use the seg|bus|devfn of the lowest
- * level we're able to differentiate devices
- */
-static int intel_iommu_device_group(struct device *dev, unsigned int *groupid)
+static int intel_iommu_add_device(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
-	struct pci_dev *bridge;
-	union {
-		struct {
-			u8 devfn;
-			u8 bus;
-			u16 segment;
-		} pci;
-		u32 group;
-	} id;
-
-	if (iommu_no_mapping(dev))
-		return -ENODEV;
-
-	id.pci.segment = pci_domain_nr(pdev->bus);
-	id.pci.bus = pdev->bus->number;
-	id.pci.devfn = pdev->devfn;
+	struct pci_dev *bridge, *dma_pdev = pdev;
+	int ret;
 
-	if (!device_to_iommu(id.pci.segment, id.pci.bus, id.pci.devfn))
+	if (!device_to_iommu(pci_domain_nr(pdev->bus),
+			     pdev->bus->number, pdev->devfn))
 		return -ENODEV;
 
 	bridge = pci_find_upstream_pcie_bridge(pdev);
 	if (bridge) {
-		if (pci_is_pcie(bridge)) {
-			id.pci.bus = bridge->subordinate->number;
-			id.pci.devfn = 0;
-		} else {
-			id.pci.bus = bridge->bus->number;
-			id.pci.devfn = bridge->devfn;
-		}
+		if (pci_is_pcie(bridge))
+			dma_pdev = pci_get_domain_bus_and_slot(
+						pci_domain_nr(pdev->bus),
+						bridge->subordinate->number, 0);
+		else
+			dma_pdev = bridge;
+	}
+
+	if (!bridge && !pdev->is_virtfn && iommu_group_mf)
+		dma_pdev = pci_get_slot(dma_pdev->bus, 0);
+
+	/* dma_pdev = iommu_pci_quirk(dma_pdev); */
+
+	if (!dma_pdev->dev.iommu_group) {
+		struct iommu_group *group;
+
+		group = iommu_group_alloc(&dma_pdev->dev);
+		if (IS_ERR(group))
+			return PTR_ERR(group);
 	}
 
-	if (!pdev->is_virtfn && iommu_group_mf)
-		id.pci.devfn = PCI_DEVFN(PCI_SLOT(id.pci.devfn), 0);
+	ret = iommu_group_add_device(dma_pdev->dev.iommu_group, dev);
+	if (ret) {
+		if (iommu_group_empty(dma_pdev->dev.iommu_group))
+			iommu_group_free(dma_pdev->dev.iommu_group);
 
-	*groupid = id.group;
+		return ret;
+	}
 
 	return 0;
 }
 
+static void intel_iommu_remove_device(struct device *dev)
+{
+	struct iommu_group *group = dev->iommu_group;
+
+	iommu_group_remove_device(dev);
+	if (iommu_group_empty(group))
+		iommu_group_free(group);
+}
+
 static struct iommu_ops intel_iommu_ops = {
 	.domain_init	= intel_iommu_domain_init,
 	.domain_destroy = intel_iommu_domain_destroy,
@@ -4132,7 +4137,8 @@ static struct iommu_ops intel_iommu_ops = {
 	.unmap		= intel_iommu_unmap,
 	.iova_to_phys	= intel_iommu_iova_to_phys,
 	.domain_has_cap = intel_iommu_domain_has_cap,
-	.device_group	= intel_iommu_device_group,
+	.add_device	= intel_iommu_add_device,
+	.remove_device	= intel_iommu_remove_device,
 	.pgsize_bitmap	= INTEL_IOMMU_PGSIZES,
 };
 
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 2198b2d..6a51ed9 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -26,48 +26,96 @@
 #include <linux/slab.h>
 #include <linux/errno.h>
 #include <linux/iommu.h>
+#include <linux/idr.h>
 
-static ssize_t show_iommu_group(struct device *dev,
-				struct device_attribute *attr, char *buf)
+static struct kset *iommu_group_kset;
+static struct idr iommu_group_idr;
+static struct mutex iommu_group_mutex;
+
+struct iommu_group_attribute {
+	struct attribute attr;
+	ssize_t (*show)(struct iommu_group *group, char *buf);
+	ssize_t (*store)(struct iommu_group *group,
+			 const char *buf, size_t count);
+};
+
+#define to_iommu_group_attr(_attr)	\
+	container_of(_attr, struct iommu_group_attribute, attr)
+#define to_iommu_group(_kobj)		\
+	container_of(_kobj, struct iommu_group, kobj)
+
+static ssize_t iommu_group_attr_show(struct kobject *kobj,
+				     struct attribute *__attr, char *buf)
 {
-	unsigned int groupid;
+	struct iommu_group_attribute *attr = to_iommu_group_attr(__attr);
+	struct iommu_group *group = to_iommu_group(kobj);
+	ssize_t ret = -EIO;
 
-	if (iommu_device_group(dev, &groupid))
-		return 0;
+	if (attr->show)
+		ret = attr->show(group, buf);
+	return ret;
+}
+
+static ssize_t iommu_group_attr_store(struct kobject *kobj,
+				      struct attribute *__attr,
+				      const char *buf, size_t count)
+{
+	struct iommu_group_attribute *attr = to_iommu_group_attr(__attr);
+	struct iommu_group *group = to_iommu_group(kobj);
+	ssize_t ret = -EIO;
 
-	return sprintf(buf, "%u", groupid);
+	if (attr->store)
+		ret = attr->store(group, buf, count);
+	return ret;
 }
-static DEVICE_ATTR(iommu_group, S_IRUGO, show_iommu_group, NULL);
 
-static int add_iommu_group(struct device *dev, void *data)
+static void iommu_group_release(struct kobject *kobj)
 {
-	unsigned int groupid;
+	struct iommu_group *group = to_iommu_group(kobj);
 
-	if (iommu_device_group(dev, &groupid) == 0)
-		return device_create_file(dev, &dev_attr_iommu_group);
+	mutex_lock(&iommu_group_mutex);
+	idr_remove(&iommu_group_idr, group->id);
+	mutex_unlock(&iommu_group_mutex);
 
-	return 0;
+	kfree(group);
 }
 
-static int remove_iommu_group(struct device *dev)
+static const struct sysfs_ops iommu_group_sysfs_ops = {
+	.show = iommu_group_attr_show,
+	.store = iommu_group_attr_store,
+};
+
+static struct kobj_type iommu_group_ktype = {
+	.sysfs_ops = &iommu_group_sysfs_ops,
+	.release = iommu_group_release,
+};
+
+static int add_iommu_group(struct device *dev, void *data)
 {
-	unsigned int groupid;
+	struct iommu_ops *ops = data;
 
-	if (iommu_device_group(dev, &groupid) == 0)
-		device_remove_file(dev, &dev_attr_iommu_group);
+	if (!ops->add_device)
+		return -ENODEV;
 
-	return 0;
+	WARN_ON(dev->iommu_group);
+
+	return ops->add_device(dev);
 }
 
 static int iommu_device_notifier(struct notifier_block *nb,
 				 unsigned long action, void *data)
 {
 	struct device *dev = data;
+	struct iommu_ops *ops = dev->bus->iommu_ops;
 
 	if (action == BUS_NOTIFY_ADD_DEVICE)
-		return add_iommu_group(dev, NULL);
-	else if (action == BUS_NOTIFY_DEL_DEVICE)
-		return remove_iommu_group(dev);
+		return add_iommu_group(dev, ops);
+	else if (action == BUS_NOTIFY_DEL_DEVICE) {
+		if (ops->remove_device && dev->iommu_group) {
+			ops->remove_device(dev);
+			return 0;
+		}
+	}
 
 	return 0;
 }
@@ -79,7 +127,97 @@ static struct notifier_block iommu_device_nb = {
 static void iommu_bus_init(struct bus_type *bus, struct iommu_ops *ops)
 {
 	bus_register_notifier(bus, &iommu_device_nb);
-	bus_for_each_dev(bus, NULL, NULL, add_iommu_group);
+	bus_for_each_dev(bus, NULL, ops, add_iommu_group);
+}
+
+struct iommu_group *iommu_group_alloc(struct device *dma_dev)
+{
+	struct iommu_group *group;
+	int ret;
+
+	group = kzalloc(sizeof(*group), GFP_KERNEL);
+	if (!group)
+		return ERR_PTR(-ENOMEM);
+
+	group->kobj.kset = iommu_group_kset;
+	group->dma_dev = dma_dev;
+	atomic_set(&group->refcnt, 0);
+
+	mutex_lock(&iommu_group_mutex);
+
+again:
+	if (unlikely(0 == idr_pre_get(&iommu_group_idr, GFP_KERNEL))) {
+		kfree(group);
+		mutex_unlock(&iommu_group_mutex);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	if (-EAGAIN == idr_get_new(&iommu_group_idr, group, &group->id))
+		goto again;
+
+	mutex_unlock(&iommu_group_mutex);
+
+	ret = kobject_init_and_add(&group->kobj, &iommu_group_ktype,
+				   NULL, "%d", group->id);
+	if (ret) {
+		iommu_group_release(&group->kobj);
+		return ERR_PTR(ret);
+	}
+
+	group->devices_kobj = kobject_create_and_add("devices", &group->kobj);
+        if (!group->devices_kobj) {
+		kobject_put(&group->kobj);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	dma_dev->iommu_group = group;
+
+	return group;
+}
+
+void iommu_group_free(struct iommu_group *group)
+{
+	group->dma_dev->iommu_group = NULL;
+	WARN_ON(atomic_read(&group->refcnt));
+	kobject_put(group->devices_kobj);
+	kobject_put(&group->kobj);
+}
+
+bool iommu_group_empty(struct iommu_group *group)
+{
+	return (0 == atomic_read(&group->refcnt));
+}
+
+int iommu_group_add_device(struct iommu_group *group, struct device *dev)
+{
+	int ret;
+
+	atomic_inc(&group->refcnt);
+
+	ret = sysfs_create_link(&dev->kobj, &group->kobj, "iommu_group");
+	if (ret)
+		return ret;
+
+	ret = sysfs_create_link(group->devices_kobj, &dev->kobj,
+                                kobject_name(&dev->kobj));
+	if (ret) {
+		sysfs_remove_link(&dev->kobj, "iommu_group");
+		return ret;
+	}
+
+	dev->iommu_group = group;
+
+	return 0;
+}
+
+void iommu_group_remove_device(struct device *dev)
+{
+	sysfs_remove_link(dev->iommu_group->devices_kobj,
+			  kobject_name(&dev->kobj));
+	sysfs_remove_link(&dev->kobj, "iommu_group");
+
+	atomic_dec(&dev->iommu_group->refcnt);
+	dev->iommu_group = NULL;
 }
 
 /**
@@ -335,9 +473,23 @@ EXPORT_SYMBOL_GPL(iommu_unmap);
 
 int iommu_device_group(struct device *dev, unsigned int *groupid)
 {
-	if (iommu_present(dev->bus) && dev->bus->iommu_ops->device_group)
-		return dev->bus->iommu_ops->device_group(dev, groupid);
+	if (dev->iommu_group) {
+		*groupid = dev->iommu_group->id;
+		return 0;
+	}
 
 	return -ENODEV;
 }
 EXPORT_SYMBOL_GPL(iommu_device_group);
+
+static int __init iommu_init(void)
+{
+	iommu_group_kset = kset_create_and_add("iommu_groups", NULL, NULL);
+	idr_init(&iommu_group_idr);
+	mutex_init(&iommu_group_mutex);
+
+	BUG_ON(!iommu_group_kset);
+
+	return 0;
+}
+subsys_initcall(iommu_init);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 2ee375c..24004d6 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -60,6 +60,8 @@ struct iommu_domain {
  * @iova_to_phys: translate iova to physical address
  * @domain_has_cap: domain capabilities query
  * @commit: commit iommu domain
+ * @add_device: add device to iommu grouping
+ * @remove_device: remove device from iommu grouping
  * @pgsize_bitmap: bitmap of supported page sizes
  */
 struct iommu_ops {
@@ -76,10 +78,25 @@ struct iommu_ops {
 	int (*domain_has_cap)(struct iommu_domain *domain,
 			      unsigned long cap);
 	int (*device_group)(struct device *dev, unsigned int *groupid);
+	int (*add_device)(struct device *dev);
+	void (*remove_device)(struct device *dev);
 	unsigned long pgsize_bitmap;
 };
 
+/**
+ * struct iommu_group - groups of devices representing iommu visibility
+ * @dma_dev: all dma from the group appears to the iommu using this source id
+ * @kobj: iommu group node in sysfs
+ * @devices_kobj: sysfs subdir node for linking devices
+ * @refcnt: number of devices in group
+ * @id: unique id number for the group (for easy sysfs listing)
+ */
 struct iommu_group {
+	struct device *dma_dev;
+	struct kobject kobj;
+	struct kobject *devices_kobj;
+	atomic_t refcnt;
+	int id;
 };
 
 extern int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops);
@@ -101,6 +118,12 @@ extern int iommu_domain_has_cap(struct iommu_domain *domain,
 extern void iommu_set_fault_handler(struct iommu_domain *domain,
 					iommu_fault_handler_t handler);
 extern int iommu_device_group(struct device *dev, unsigned int *groupid);
+extern struct iommu_group *iommu_group_alloc(struct device *dev);
+extern void iommu_group_free(struct iommu_group *group);
+extern bool iommu_group_empty(struct iommu_group *group);
+extern int iommu_group_add_device(struct iommu_group *group,
+				  struct device *dev);
+extern void iommu_group_remove_device(struct device *dev);
 
 /**
  * report_iommu_fault() - report about an IOMMU fault to the IOMMU framework

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [RFC PATCH 2/3] iommu: Create basic group infrastructure and update AMD-Vi & Intel VT-d
@ 2012-04-02 21:14   ` Alex Williamson
  0 siblings, 0 replies; 21+ messages in thread
From: Alex Williamson @ 2012-04-02 21:14 UTC (permalink / raw)
  To: david, dwmw2
  Cc: kvm, joerg.roedel, linux-kernel, iommu, qemu-devel, alex.williamson

IOMMU groups define the minimum granularity of the IOMMU.  We therefore
create groups using a dma_dev which is the effective requestor ID for
the entire group.  Additional devices can be added to groups based on
system topology, IOMMU limitations, or device quirks.

This implementation also includes a simple idr database for creating
a flat address space of group IDs across multiple IOMMUs.  Updates
included for Intel VT-d, using example iommu callbacks for adding and
removing devices, as well as AMD-Vi, which tracks devices on it's own.
We should be able to better integrate the iommu_group within existing
AMD-Vi structs or provide a private data location within the iommu_group
where we can eventually reduce redundancy.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 drivers/iommu/amd_iommu.c   |   50 ++++++-----
 drivers/iommu/intel-iommu.c |   76 +++++++++--------
 drivers/iommu/iommu.c       |  198 ++++++++++++++++++++++++++++++++++++++-----
 include/linux/iommu.h       |   23 +++++
 4 files changed, 267 insertions(+), 80 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index f75e060..876db28 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -256,9 +256,10 @@ static bool check_device(struct device *dev)
 
 static int iommu_init_device(struct device *dev)
 {
-	struct pci_dev *pdev = to_pci_dev(dev);
+	struct pci_dev *dma_pdev, *pdev = to_pci_dev(dev);
 	struct iommu_dev_data *dev_data;
 	u16 alias;
+	int ret;
 
 	if (dev->archdata.iommu)
 		return 0;
@@ -279,6 +280,26 @@ static int iommu_init_device(struct device *dev)
 			return -ENOTSUPP;
 		}
 		dev_data->alias_data = alias_data;
+
+		dma_pdev = pci_get_bus_and_slot(alias >> 8, alias & 0xff);
+	} else
+		dma_pdev = pdev;
+
+	/* dma_pdev = iommu_pci_quirk(dma_pdev) */
+
+	if (!dma_pdev->dev.iommu_group) {
+		struct iommu_group *group;
+
+		group = iommu_group_alloc(&dma_pdev->dev);
+		if (IS_ERR(group))
+			return PTR_ERR(group);
+	}
+
+	ret = iommu_group_add_device(dma_pdev->dev.iommu_group, dev);
+	if (ret) {
+		if (iommu_group_empty(dma_pdev->dev.iommu_group))
+			iommu_group_free(dma_pdev->dev.iommu_group);
+		return ret;
 	}
 
 	if (pci_iommuv2_capable(pdev)) {
@@ -309,6 +330,12 @@ static void iommu_ignore_device(struct device *dev)
 
 static void iommu_uninit_device(struct device *dev)
 {
+	struct iommu_group *group = dev->iommu_group;
+
+	iommu_group_remove_device(dev);
+	if (iommu_group_empty(group))
+		iommu_group_free(group);
+
 	/*
 	 * Nothing to do here - we keep dev_data around for unplugged devices
 	 * and reuse it when the device is re-plugged - not doing so would
@@ -3191,26 +3218,6 @@ static int amd_iommu_domain_has_cap(struct iommu_domain *domain,
 	return 0;
 }
 
-static int amd_iommu_device_group(struct device *dev, unsigned int *groupid)
-{
-	struct iommu_dev_data *dev_data = dev->archdata.iommu;
-	struct pci_dev *pdev = to_pci_dev(dev);
-	u16 devid;
-
-	if (!dev_data)
-		return -ENODEV;
-
-	if (pdev->is_virtfn || !iommu_group_mf)
-		devid = dev_data->devid;
-	else
-		devid = calc_devid(pdev->bus->number,
-				   PCI_DEVFN(PCI_SLOT(pdev->devfn), 0));
-
-	*groupid = amd_iommu_alias_table[devid];
-
-	return 0;
-}
-
 static struct iommu_ops amd_iommu_ops = {
 	.domain_init = amd_iommu_domain_init,
 	.domain_destroy = amd_iommu_domain_destroy,
@@ -3220,7 +3227,6 @@ static struct iommu_ops amd_iommu_ops = {
 	.unmap = amd_iommu_unmap,
 	.iova_to_phys = amd_iommu_iova_to_phys,
 	.domain_has_cap = amd_iommu_domain_has_cap,
-	.device_group = amd_iommu_device_group,
 	.pgsize_bitmap	= AMD_IOMMU_PGSIZES,
 };
 
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index c9c6053..41ab7d0 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4075,54 +4075,59 @@ static int intel_iommu_domain_has_cap(struct iommu_domain *domain,
 	return 0;
 }
 
-/*
- * Group numbers are arbitrary.  Device with the same group number
- * indicate the iommu cannot differentiate between them.  To avoid
- * tracking used groups we just use the seg|bus|devfn of the lowest
- * level we're able to differentiate devices
- */
-static int intel_iommu_device_group(struct device *dev, unsigned int *groupid)
+static int intel_iommu_add_device(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
-	struct pci_dev *bridge;
-	union {
-		struct {
-			u8 devfn;
-			u8 bus;
-			u16 segment;
-		} pci;
-		u32 group;
-	} id;
-
-	if (iommu_no_mapping(dev))
-		return -ENODEV;
-
-	id.pci.segment = pci_domain_nr(pdev->bus);
-	id.pci.bus = pdev->bus->number;
-	id.pci.devfn = pdev->devfn;
+	struct pci_dev *bridge, *dma_pdev = pdev;
+	int ret;
 
-	if (!device_to_iommu(id.pci.segment, id.pci.bus, id.pci.devfn))
+	if (!device_to_iommu(pci_domain_nr(pdev->bus),
+			     pdev->bus->number, pdev->devfn))
 		return -ENODEV;
 
 	bridge = pci_find_upstream_pcie_bridge(pdev);
 	if (bridge) {
-		if (pci_is_pcie(bridge)) {
-			id.pci.bus = bridge->subordinate->number;
-			id.pci.devfn = 0;
-		} else {
-			id.pci.bus = bridge->bus->number;
-			id.pci.devfn = bridge->devfn;
-		}
+		if (pci_is_pcie(bridge))
+			dma_pdev = pci_get_domain_bus_and_slot(
+						pci_domain_nr(pdev->bus),
+						bridge->subordinate->number, 0);
+		else
+			dma_pdev = bridge;
+	}
+
+	if (!bridge && !pdev->is_virtfn && iommu_group_mf)
+		dma_pdev = pci_get_slot(dma_pdev->bus, 0);
+
+	/* dma_pdev = iommu_pci_quirk(dma_pdev); */
+
+	if (!dma_pdev->dev.iommu_group) {
+		struct iommu_group *group;
+
+		group = iommu_group_alloc(&dma_pdev->dev);
+		if (IS_ERR(group))
+			return PTR_ERR(group);
 	}
 
-	if (!pdev->is_virtfn && iommu_group_mf)
-		id.pci.devfn = PCI_DEVFN(PCI_SLOT(id.pci.devfn), 0);
+	ret = iommu_group_add_device(dma_pdev->dev.iommu_group, dev);
+	if (ret) {
+		if (iommu_group_empty(dma_pdev->dev.iommu_group))
+			iommu_group_free(dma_pdev->dev.iommu_group);
 
-	*groupid = id.group;
+		return ret;
+	}
 
 	return 0;
 }
 
+static void intel_iommu_remove_device(struct device *dev)
+{
+	struct iommu_group *group = dev->iommu_group;
+
+	iommu_group_remove_device(dev);
+	if (iommu_group_empty(group))
+		iommu_group_free(group);
+}
+
 static struct iommu_ops intel_iommu_ops = {
 	.domain_init	= intel_iommu_domain_init,
 	.domain_destroy = intel_iommu_domain_destroy,
@@ -4132,7 +4137,8 @@ static struct iommu_ops intel_iommu_ops = {
 	.unmap		= intel_iommu_unmap,
 	.iova_to_phys	= intel_iommu_iova_to_phys,
 	.domain_has_cap = intel_iommu_domain_has_cap,
-	.device_group	= intel_iommu_device_group,
+	.add_device	= intel_iommu_add_device,
+	.remove_device	= intel_iommu_remove_device,
 	.pgsize_bitmap	= INTEL_IOMMU_PGSIZES,
 };
 
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 2198b2d..6a51ed9 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -26,48 +26,96 @@
 #include <linux/slab.h>
 #include <linux/errno.h>
 #include <linux/iommu.h>
+#include <linux/idr.h>
 
-static ssize_t show_iommu_group(struct device *dev,
-				struct device_attribute *attr, char *buf)
+static struct kset *iommu_group_kset;
+static struct idr iommu_group_idr;
+static struct mutex iommu_group_mutex;
+
+struct iommu_group_attribute {
+	struct attribute attr;
+	ssize_t (*show)(struct iommu_group *group, char *buf);
+	ssize_t (*store)(struct iommu_group *group,
+			 const char *buf, size_t count);
+};
+
+#define to_iommu_group_attr(_attr)	\
+	container_of(_attr, struct iommu_group_attribute, attr)
+#define to_iommu_group(_kobj)		\
+	container_of(_kobj, struct iommu_group, kobj)
+
+static ssize_t iommu_group_attr_show(struct kobject *kobj,
+				     struct attribute *__attr, char *buf)
 {
-	unsigned int groupid;
+	struct iommu_group_attribute *attr = to_iommu_group_attr(__attr);
+	struct iommu_group *group = to_iommu_group(kobj);
+	ssize_t ret = -EIO;
 
-	if (iommu_device_group(dev, &groupid))
-		return 0;
+	if (attr->show)
+		ret = attr->show(group, buf);
+	return ret;
+}
+
+static ssize_t iommu_group_attr_store(struct kobject *kobj,
+				      struct attribute *__attr,
+				      const char *buf, size_t count)
+{
+	struct iommu_group_attribute *attr = to_iommu_group_attr(__attr);
+	struct iommu_group *group = to_iommu_group(kobj);
+	ssize_t ret = -EIO;
 
-	return sprintf(buf, "%u", groupid);
+	if (attr->store)
+		ret = attr->store(group, buf, count);
+	return ret;
 }
-static DEVICE_ATTR(iommu_group, S_IRUGO, show_iommu_group, NULL);
 
-static int add_iommu_group(struct device *dev, void *data)
+static void iommu_group_release(struct kobject *kobj)
 {
-	unsigned int groupid;
+	struct iommu_group *group = to_iommu_group(kobj);
 
-	if (iommu_device_group(dev, &groupid) == 0)
-		return device_create_file(dev, &dev_attr_iommu_group);
+	mutex_lock(&iommu_group_mutex);
+	idr_remove(&iommu_group_idr, group->id);
+	mutex_unlock(&iommu_group_mutex);
 
-	return 0;
+	kfree(group);
 }
 
-static int remove_iommu_group(struct device *dev)
+static const struct sysfs_ops iommu_group_sysfs_ops = {
+	.show = iommu_group_attr_show,
+	.store = iommu_group_attr_store,
+};
+
+static struct kobj_type iommu_group_ktype = {
+	.sysfs_ops = &iommu_group_sysfs_ops,
+	.release = iommu_group_release,
+};
+
+static int add_iommu_group(struct device *dev, void *data)
 {
-	unsigned int groupid;
+	struct iommu_ops *ops = data;
 
-	if (iommu_device_group(dev, &groupid) == 0)
-		device_remove_file(dev, &dev_attr_iommu_group);
+	if (!ops->add_device)
+		return -ENODEV;
 
-	return 0;
+	WARN_ON(dev->iommu_group);
+
+	return ops->add_device(dev);
 }
 
 static int iommu_device_notifier(struct notifier_block *nb,
 				 unsigned long action, void *data)
 {
 	struct device *dev = data;
+	struct iommu_ops *ops = dev->bus->iommu_ops;
 
 	if (action == BUS_NOTIFY_ADD_DEVICE)
-		return add_iommu_group(dev, NULL);
-	else if (action == BUS_NOTIFY_DEL_DEVICE)
-		return remove_iommu_group(dev);
+		return add_iommu_group(dev, ops);
+	else if (action == BUS_NOTIFY_DEL_DEVICE) {
+		if (ops->remove_device && dev->iommu_group) {
+			ops->remove_device(dev);
+			return 0;
+		}
+	}
 
 	return 0;
 }
@@ -79,7 +127,97 @@ static struct notifier_block iommu_device_nb = {
 static void iommu_bus_init(struct bus_type *bus, struct iommu_ops *ops)
 {
 	bus_register_notifier(bus, &iommu_device_nb);
-	bus_for_each_dev(bus, NULL, NULL, add_iommu_group);
+	bus_for_each_dev(bus, NULL, ops, add_iommu_group);
+}
+
+struct iommu_group *iommu_group_alloc(struct device *dma_dev)
+{
+	struct iommu_group *group;
+	int ret;
+
+	group = kzalloc(sizeof(*group), GFP_KERNEL);
+	if (!group)
+		return ERR_PTR(-ENOMEM);
+
+	group->kobj.kset = iommu_group_kset;
+	group->dma_dev = dma_dev;
+	atomic_set(&group->refcnt, 0);
+
+	mutex_lock(&iommu_group_mutex);
+
+again:
+	if (unlikely(0 == idr_pre_get(&iommu_group_idr, GFP_KERNEL))) {
+		kfree(group);
+		mutex_unlock(&iommu_group_mutex);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	if (-EAGAIN == idr_get_new(&iommu_group_idr, group, &group->id))
+		goto again;
+
+	mutex_unlock(&iommu_group_mutex);
+
+	ret = kobject_init_and_add(&group->kobj, &iommu_group_ktype,
+				   NULL, "%d", group->id);
+	if (ret) {
+		iommu_group_release(&group->kobj);
+		return ERR_PTR(ret);
+	}
+
+	group->devices_kobj = kobject_create_and_add("devices", &group->kobj);
+        if (!group->devices_kobj) {
+		kobject_put(&group->kobj);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	dma_dev->iommu_group = group;
+
+	return group;
+}
+
+void iommu_group_free(struct iommu_group *group)
+{
+	group->dma_dev->iommu_group = NULL;
+	WARN_ON(atomic_read(&group->refcnt));
+	kobject_put(group->devices_kobj);
+	kobject_put(&group->kobj);
+}
+
+bool iommu_group_empty(struct iommu_group *group)
+{
+	return (0 == atomic_read(&group->refcnt));
+}
+
+int iommu_group_add_device(struct iommu_group *group, struct device *dev)
+{
+	int ret;
+
+	atomic_inc(&group->refcnt);
+
+	ret = sysfs_create_link(&dev->kobj, &group->kobj, "iommu_group");
+	if (ret)
+		return ret;
+
+	ret = sysfs_create_link(group->devices_kobj, &dev->kobj,
+                                kobject_name(&dev->kobj));
+	if (ret) {
+		sysfs_remove_link(&dev->kobj, "iommu_group");
+		return ret;
+	}
+
+	dev->iommu_group = group;
+
+	return 0;
+}
+
+void iommu_group_remove_device(struct device *dev)
+{
+	sysfs_remove_link(dev->iommu_group->devices_kobj,
+			  kobject_name(&dev->kobj));
+	sysfs_remove_link(&dev->kobj, "iommu_group");
+
+	atomic_dec(&dev->iommu_group->refcnt);
+	dev->iommu_group = NULL;
 }
 
 /**
@@ -335,9 +473,23 @@ EXPORT_SYMBOL_GPL(iommu_unmap);
 
 int iommu_device_group(struct device *dev, unsigned int *groupid)
 {
-	if (iommu_present(dev->bus) && dev->bus->iommu_ops->device_group)
-		return dev->bus->iommu_ops->device_group(dev, groupid);
+	if (dev->iommu_group) {
+		*groupid = dev->iommu_group->id;
+		return 0;
+	}
 
 	return -ENODEV;
 }
 EXPORT_SYMBOL_GPL(iommu_device_group);
+
+static int __init iommu_init(void)
+{
+	iommu_group_kset = kset_create_and_add("iommu_groups", NULL, NULL);
+	idr_init(&iommu_group_idr);
+	mutex_init(&iommu_group_mutex);
+
+	BUG_ON(!iommu_group_kset);
+
+	return 0;
+}
+subsys_initcall(iommu_init);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 2ee375c..24004d6 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -60,6 +60,8 @@ struct iommu_domain {
  * @iova_to_phys: translate iova to physical address
  * @domain_has_cap: domain capabilities query
  * @commit: commit iommu domain
+ * @add_device: add device to iommu grouping
+ * @remove_device: remove device from iommu grouping
  * @pgsize_bitmap: bitmap of supported page sizes
  */
 struct iommu_ops {
@@ -76,10 +78,25 @@ struct iommu_ops {
 	int (*domain_has_cap)(struct iommu_domain *domain,
 			      unsigned long cap);
 	int (*device_group)(struct device *dev, unsigned int *groupid);
+	int (*add_device)(struct device *dev);
+	void (*remove_device)(struct device *dev);
 	unsigned long pgsize_bitmap;
 };
 
+/**
+ * struct iommu_group - groups of devices representing iommu visibility
+ * @dma_dev: all dma from the group appears to the iommu using this source id
+ * @kobj: iommu group node in sysfs
+ * @devices_kobj: sysfs subdir node for linking devices
+ * @refcnt: number of devices in group
+ * @id: unique id number for the group (for easy sysfs listing)
+ */
 struct iommu_group {
+	struct device *dma_dev;
+	struct kobject kobj;
+	struct kobject *devices_kobj;
+	atomic_t refcnt;
+	int id;
 };
 
 extern int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops);
@@ -101,6 +118,12 @@ extern int iommu_domain_has_cap(struct iommu_domain *domain,
 extern void iommu_set_fault_handler(struct iommu_domain *domain,
 					iommu_fault_handler_t handler);
 extern int iommu_device_group(struct device *dev, unsigned int *groupid);
+extern struct iommu_group *iommu_group_alloc(struct device *dev);
+extern void iommu_group_free(struct iommu_group *group);
+extern bool iommu_group_empty(struct iommu_group *group);
+extern int iommu_group_add_device(struct iommu_group *group,
+				  struct device *dev);
+extern void iommu_group_remove_device(struct device *dev);
 
 /**
  * report_iommu_fault() - report about an IOMMU fault to the IOMMU framework

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 3/3] iommu: Create attach/detach group interface
@ 2012-04-02 21:14   ` Alex Williamson
  0 siblings, 0 replies; 21+ messages in thread
From: Alex Williamson @ 2012-04-02 21:14 UTC (permalink / raw)
  To: david, dwmw2
  Cc: alex.williamson, iommu, qemu-devel, joerg.roedel, kvm, linux-kernel

IOMMU ops should be working at a group level rather than a device
level as we cannot arbitrarily assign devices from the same group
to different domains.  For now this is just a simple wrapper that
makes use of the dma_dev within a group.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 drivers/iommu/iommu.c |   12 ++++++++++++
 include/linux/iommu.h |   15 +++++++++++++++
 2 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 6a51ed9..af77da1 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -327,6 +327,18 @@ void iommu_detach_device(struct iommu_domain *domain, struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_detach_device);
 
+int iommu_attach_group(struct iommu_domain *domain, struct iommu_group *group)
+{
+	return iommu_attach_device(domain, group->dma_dev);
+}
+EXPORT_SYMBOL_GPL(iommu_attach_group);
+
+void iommu_detach_group(struct iommu_domain *domain, struct iommu_group *group)
+{
+	iommu_detach_device(domain, group->dma_dev);
+}
+EXPORT_SYMBOL_GPL(iommu_detach_group);
+
 phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
 			       unsigned long iova)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 24004d6..d4e9a7a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -107,6 +107,10 @@ extern int iommu_attach_device(struct iommu_domain *domain,
 			       struct device *dev);
 extern void iommu_detach_device(struct iommu_domain *domain,
 				struct device *dev);
+extern int iommu_attach_group(struct iommu_domain *domain,
+			      struct iommu_group *group);
+extern void iommu_detach_group(struct iommu_domain *domain,
+			       struct iommu_group *group);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
 		     phys_addr_t paddr, size_t size, int prot);
 extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
@@ -194,6 +198,17 @@ static inline void iommu_detach_device(struct iommu_domain *domain,
 {
 }
 
+static inline int iommu_attach_group(struct iommu_domain *domain,
+				     struct iommu_group *group)
+{
+	return -ENODEV;
+}
+
+static inline void iommu_detach_group(struct iommu_domain *domain,
+				      struct iommu_group *group)
+{
+}
+
 static inline int iommu_map(struct iommu_domain *domain, unsigned long iova,
 			    phys_addr_t paddr, int gfp_order, int prot)
 {


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 3/3] iommu: Create attach/detach group interface
@ 2012-04-02 21:14   ` Alex Williamson
  0 siblings, 0 replies; 21+ messages in thread
From: Alex Williamson @ 2012-04-02 21:14 UTC (permalink / raw)
  To: david-xT8FGy+AXnRB3Ne2BGzF6laj5H9X9Tb+, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A

IOMMU ops should be working at a group level rather than a device
level as we cannot arbitrarily assign devices from the same group
to different domains.  For now this is just a simple wrapper that
makes use of the dma_dev within a group.

Signed-off-by: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---

 drivers/iommu/iommu.c |   12 ++++++++++++
 include/linux/iommu.h |   15 +++++++++++++++
 2 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 6a51ed9..af77da1 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -327,6 +327,18 @@ void iommu_detach_device(struct iommu_domain *domain, struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_detach_device);
 
+int iommu_attach_group(struct iommu_domain *domain, struct iommu_group *group)
+{
+	return iommu_attach_device(domain, group->dma_dev);
+}
+EXPORT_SYMBOL_GPL(iommu_attach_group);
+
+void iommu_detach_group(struct iommu_domain *domain, struct iommu_group *group)
+{
+	iommu_detach_device(domain, group->dma_dev);
+}
+EXPORT_SYMBOL_GPL(iommu_detach_group);
+
 phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
 			       unsigned long iova)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 24004d6..d4e9a7a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -107,6 +107,10 @@ extern int iommu_attach_device(struct iommu_domain *domain,
 			       struct device *dev);
 extern void iommu_detach_device(struct iommu_domain *domain,
 				struct device *dev);
+extern int iommu_attach_group(struct iommu_domain *domain,
+			      struct iommu_group *group);
+extern void iommu_detach_group(struct iommu_domain *domain,
+			       struct iommu_group *group);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
 		     phys_addr_t paddr, size_t size, int prot);
 extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
@@ -194,6 +198,17 @@ static inline void iommu_detach_device(struct iommu_domain *domain,
 {
 }
 
+static inline int iommu_attach_group(struct iommu_domain *domain,
+				     struct iommu_group *group)
+{
+	return -ENODEV;
+}
+
+static inline void iommu_detach_group(struct iommu_domain *domain,
+				      struct iommu_group *group)
+{
+}
+
 static inline int iommu_map(struct iommu_domain *domain, unsigned long iova,
 			    phys_addr_t paddr, int gfp_order, int prot)
 {

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [RFC PATCH 3/3] iommu: Create attach/detach group interface
@ 2012-04-02 21:14   ` Alex Williamson
  0 siblings, 0 replies; 21+ messages in thread
From: Alex Williamson @ 2012-04-02 21:14 UTC (permalink / raw)
  To: david, dwmw2
  Cc: kvm, joerg.roedel, linux-kernel, iommu, qemu-devel, alex.williamson

IOMMU ops should be working at a group level rather than a device
level as we cannot arbitrarily assign devices from the same group
to different domains.  For now this is just a simple wrapper that
makes use of the dma_dev within a group.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 drivers/iommu/iommu.c |   12 ++++++++++++
 include/linux/iommu.h |   15 +++++++++++++++
 2 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 6a51ed9..af77da1 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -327,6 +327,18 @@ void iommu_detach_device(struct iommu_domain *domain, struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_detach_device);
 
+int iommu_attach_group(struct iommu_domain *domain, struct iommu_group *group)
+{
+	return iommu_attach_device(domain, group->dma_dev);
+}
+EXPORT_SYMBOL_GPL(iommu_attach_group);
+
+void iommu_detach_group(struct iommu_domain *domain, struct iommu_group *group)
+{
+	iommu_detach_device(domain, group->dma_dev);
+}
+EXPORT_SYMBOL_GPL(iommu_detach_group);
+
 phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
 			       unsigned long iova)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 24004d6..d4e9a7a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -107,6 +107,10 @@ extern int iommu_attach_device(struct iommu_domain *domain,
 			       struct device *dev);
 extern void iommu_detach_device(struct iommu_domain *domain,
 				struct device *dev);
+extern int iommu_attach_group(struct iommu_domain *domain,
+			      struct iommu_group *group);
+extern void iommu_detach_group(struct iommu_domain *domain,
+			       struct iommu_group *group);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
 		     phys_addr_t paddr, size_t size, int prot);
 extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
@@ -194,6 +198,17 @@ static inline void iommu_detach_device(struct iommu_domain *domain,
 {
 }
 
+static inline int iommu_attach_group(struct iommu_domain *domain,
+				     struct iommu_group *group)
+{
+	return -ENODEV;
+}
+
+static inline void iommu_detach_group(struct iommu_domain *domain,
+				      struct iommu_group *group)
+{
+}
+
 static inline int iommu_map(struct iommu_domain *domain, unsigned long iova,
 			    phys_addr_t paddr, int gfp_order, int prot)
 {

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 0/3] IOMMU groups
@ 2012-04-10 21:03   ` Alex Williamson
  0 siblings, 0 replies; 21+ messages in thread
From: Alex Williamson @ 2012-04-10 21:03 UTC (permalink / raw)
  To: david, dwmw2; +Cc: iommu, qemu-devel, joerg.roedel, kvm, linux-kernel


Ping.  Does this approach look like it could satisfy your desire for a
more integrated group layer?  I'd really like to move VFIO forward,
we've been stalled on this long enough.  David Woodhouse, I think this
provides the quirking you're looking for for device like the Ricoh, do
you have any other requirements for a group layer?  Thanks,

Alex

On Mon, 2012-04-02 at 15:14 -0600, Alex Williamson wrote:
> This series attempts to make IOMMU device grouping a slightly more
> integral part of the device model.  iommu_device_groups were originally
> introduced to support the VFIO user space driver interface which needs
> to understand the granularity of device isolation in order to ensure
> security of devices when assigned for user access.  This information
> was provided via a simple group identifier from the IOMMU driver allowing
> VFIO to walk devices and assemble groups itself.
> 
> The feedback received from this was that groups should be the effective
> unit of work for the IOMMU API.  The existing model of allowing domains
> to be created and individual devices attached ignores many of the
> restrictions of the IOMMU, whether by design, by topology or by defective
> devices.  Additionally we should be able to use the grouping information
> at the dma ops layer for managing domains and quirking devices.
> 
> This series is a sketch at implementing only those aspects and leaving
> everything else about the multifaceted hairball of Isolation groups for
> another API.  Please comment and let me know if this seems like the
> direction we should be headed.  Thanks,
> 
> Alex
> 
> 
> ---
> 
> Alex Williamson (3):
>       iommu: Create attach/detach group interface
>       iommu: Create basic group infrastructure and update AMD-Vi & Intel VT-d
>       iommu: Introduce iommu_group
> 
> 
>  drivers/iommu/amd_iommu.c   |   50 ++++++----
>  drivers/iommu/intel-iommu.c |   76 ++++++++--------
>  drivers/iommu/iommu.c       |  210 ++++++++++++++++++++++++++++++++++++++-----
>  include/linux/device.h      |    2 
>  include/linux/iommu.h       |   43 +++++++++
>  5 files changed, 301 insertions(+), 80 deletions(-)




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 0/3] IOMMU groups
@ 2012-04-10 21:03   ` Alex Williamson
  0 siblings, 0 replies; 21+ messages in thread
From: Alex Williamson @ 2012-04-10 21:03 UTC (permalink / raw)
  To: david-xT8FGy+AXnRB3Ne2BGzF6laj5H9X9Tb+, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A, kvm-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA


Ping.  Does this approach look like it could satisfy your desire for a
more integrated group layer?  I'd really like to move VFIO forward,
we've been stalled on this long enough.  David Woodhouse, I think this
provides the quirking you're looking for for device like the Ricoh, do
you have any other requirements for a group layer?  Thanks,

Alex

On Mon, 2012-04-02 at 15:14 -0600, Alex Williamson wrote:
> This series attempts to make IOMMU device grouping a slightly more
> integral part of the device model.  iommu_device_groups were originally
> introduced to support the VFIO user space driver interface which needs
> to understand the granularity of device isolation in order to ensure
> security of devices when assigned for user access.  This information
> was provided via a simple group identifier from the IOMMU driver allowing
> VFIO to walk devices and assemble groups itself.
> 
> The feedback received from this was that groups should be the effective
> unit of work for the IOMMU API.  The existing model of allowing domains
> to be created and individual devices attached ignores many of the
> restrictions of the IOMMU, whether by design, by topology or by defective
> devices.  Additionally we should be able to use the grouping information
> at the dma ops layer for managing domains and quirking devices.
> 
> This series is a sketch at implementing only those aspects and leaving
> everything else about the multifaceted hairball of Isolation groups for
> another API.  Please comment and let me know if this seems like the
> direction we should be headed.  Thanks,
> 
> Alex
> 
> 
> ---
> 
> Alex Williamson (3):
>       iommu: Create attach/detach group interface
>       iommu: Create basic group infrastructure and update AMD-Vi & Intel VT-d
>       iommu: Introduce iommu_group
> 
> 
>  drivers/iommu/amd_iommu.c   |   50 ++++++----
>  drivers/iommu/intel-iommu.c |   76 ++++++++--------
>  drivers/iommu/iommu.c       |  210 ++++++++++++++++++++++++++++++++++++++-----
>  include/linux/device.h      |    2 
>  include/linux/iommu.h       |   43 +++++++++
>  5 files changed, 301 insertions(+), 80 deletions(-)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 1/3] iommu: Introduce iommu_group
@ 2012-04-18  9:58     ` David Gibson
  0 siblings, 0 replies; 21+ messages in thread
From: David Gibson @ 2012-04-18  9:58 UTC (permalink / raw)
  To: Alex Williamson; +Cc: dwmw2, iommu, qemu-devel, joerg.roedel, kvm, linux-kernel

On Mon, Apr 02, 2012 at 03:14:40PM -0600, Alex Williamson wrote:
> IOMMUs often do not have visibility of individual devices in the
> system.  Due to IOMMU design, bus topology, or device quirks, we
> can often only identify groups of devices.  Examples include
> Intel VT-d & AMD-Vi which often have function level visibility
> compared to POWER partitionable endpoints which have bridge level
> granularity.

That's a significant oversimplification of the situation on POWER,
although it doesn't really matter in this context.  On older (i.e. pre
PCI-E) hardware, PEs have either host bridge (i.e. domain)
granularity, or in IIUC in some cases p2p bridge granularity, using
special p2p bridges, since that's the only real way to do iommu
differentiation without the PCI-E requestor IDs.  This isn't as coarse
as it seems in practice, because the hardware is usually built with a
bridge per physical PCI slot.

On newer PCI-E hardware, the PE granularity is basically a firmware
decision, and can go down to function level.  I believe pHyp puts the
granularity at the bridge level.  Our non-virtualized Linux "firmware"
currently does put it at the function level, but Ben is thinking about
changing that to bridge level: again, because of the hardware design
that isn't as coarse as it seems, and at this level we can hardware
guarantee isolation to a degree that's not possible at the function
level.

>  PCIe-to-PCI bridges also often cloud the IOMMU
> visibility as it cannot distiguish devices behind the bridge.
> Devices can also sometimes hurt themselves by initiating DMA using
> the wrong source ID on a multifunction PCI device.
> 
> IOMMU groups are meant to help solve these problems and hopefully
> become the working unit of the IOMMI API.

So far, so simple.  No objections here.  I am trying to work out what
the real difference in approach is in this seriers from either your or
my earlier isolation group series.  AFAICT it's just that this
approach is explicitly only about IOMMU identity, ignoring (here) any
other factors which might affect isolation.  Or am I missing
something?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 1/3] iommu: Introduce iommu_group
@ 2012-04-18  9:58     ` David Gibson
  0 siblings, 0 replies; 21+ messages in thread
From: David Gibson @ 2012-04-18  9:58 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ

On Mon, Apr 02, 2012 at 03:14:40PM -0600, Alex Williamson wrote:
> IOMMUs often do not have visibility of individual devices in the
> system.  Due to IOMMU design, bus topology, or device quirks, we
> can often only identify groups of devices.  Examples include
> Intel VT-d & AMD-Vi which often have function level visibility
> compared to POWER partitionable endpoints which have bridge level
> granularity.

That's a significant oversimplification of the situation on POWER,
although it doesn't really matter in this context.  On older (i.e. pre
PCI-E) hardware, PEs have either host bridge (i.e. domain)
granularity, or in IIUC in some cases p2p bridge granularity, using
special p2p bridges, since that's the only real way to do iommu
differentiation without the PCI-E requestor IDs.  This isn't as coarse
as it seems in practice, because the hardware is usually built with a
bridge per physical PCI slot.

On newer PCI-E hardware, the PE granularity is basically a firmware
decision, and can go down to function level.  I believe pHyp puts the
granularity at the bridge level.  Our non-virtualized Linux "firmware"
currently does put it at the function level, but Ben is thinking about
changing that to bridge level: again, because of the hardware design
that isn't as coarse as it seems, and at this level we can hardware
guarantee isolation to a degree that's not possible at the function
level.

>  PCIe-to-PCI bridges also often cloud the IOMMU
> visibility as it cannot distiguish devices behind the bridge.
> Devices can also sometimes hurt themselves by initiating DMA using
> the wrong source ID on a multifunction PCI device.
> 
> IOMMU groups are meant to help solve these problems and hopefully
> become the working unit of the IOMMI API.

So far, so simple.  No objections here.  I am trying to work out what
the real difference in approach is in this seriers from either your or
my earlier isolation group series.  AFAICT it's just that this
approach is explicitly only about IOMMU identity, ignoring (here) any
other factors which might affect isolation.  Or am I missing
something?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 2/3] iommu: Create basic group infrastructure and update AMD-Vi & Intel VT-d
  2012-04-02 21:14   ` Alex Williamson
  (?)
  (?)
@ 2012-04-18 11:55   ` David Gibson
  2012-04-18 20:28       ` Alex Williamson
  -1 siblings, 1 reply; 21+ messages in thread
From: David Gibson @ 2012-04-18 11:55 UTC (permalink / raw)
  To: Alex Williamson; +Cc: dwmw2, iommu, qemu-devel, joerg.roedel, kvm, linux-kernel

On Mon, Apr 02, 2012 at 03:14:46PM -0600, Alex Williamson wrote:
> IOMMU groups define the minimum granularity of the IOMMU.  We therefore
> create groups using a dma_dev which is the effective requestor ID for
> the entire group.  Additional devices can be added to groups based on
> system topology, IOMMU limitations, or device quirks.
> 
> This implementation also includes a simple idr database for creating
> a flat address space of group IDs across multiple IOMMUs.  Updates
> included for Intel VT-d, using example iommu callbacks for adding and
> removing devices, as well as AMD-Vi, which tracks devices on it's own.
> We should be able to better integrate the iommu_group within existing
> AMD-Vi structs or provide a private data location within the iommu_group
> where we can eventually reduce redundancy.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

Looks reasonable as far as it goes.  This still lacks an obvious means
for doing group assignment of devices on busses subordinate to devices
that are on iommu managed busses.  Which as we discussed earlier is a
bit of a can of worms, but necessary.

> ---
> 
>  drivers/iommu/amd_iommu.c   |   50 ++++++-----
>  drivers/iommu/intel-iommu.c |   76 +++++++++--------
>  drivers/iommu/iommu.c       |  198 ++++++++++++++++++++++++++++++++++++++-----
>  include/linux/iommu.h       |   23 +++++
>  4 files changed, 267 insertions(+), 80 deletions(-)
> 
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index f75e060..876db28 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -256,9 +256,10 @@ static bool check_device(struct device *dev)
>  
>  static int iommu_init_device(struct device *dev)
>  {
> -	struct pci_dev *pdev = to_pci_dev(dev);
> +	struct pci_dev *dma_pdev, *pdev = to_pci_dev(dev);
>  	struct iommu_dev_data *dev_data;
>  	u16 alias;
> +	int ret;
>  
>  	if (dev->archdata.iommu)
>  		return 0;
> @@ -279,6 +280,26 @@ static int iommu_init_device(struct device *dev)
>  			return -ENOTSUPP;
>  		}
>  		dev_data->alias_data = alias_data;
> +
> +		dma_pdev = pci_get_bus_and_slot(alias >> 8, alias & 0xff);
> +	} else
> +		dma_pdev = pdev;
> +
> +	/* dma_pdev = iommu_pci_quirk(dma_pdev) */

Presumably an actual implementation of the quirk is coming?  It might
be an idea for it to take both the individual and representative
devices, in case that information is useful.

> +	if (!dma_pdev->dev.iommu_group) {
> +		struct iommu_group *group;
> +
> +		group = iommu_group_alloc(&dma_pdev->dev);
> +		if (IS_ERR(group))
> +			return PTR_ERR(group);
> +	}
> +
> +	ret = iommu_group_add_device(dma_pdev->dev.iommu_group, dev);
> +	if (ret) {
> +		if (iommu_group_empty(dma_pdev->dev.iommu_group))
> +			iommu_group_free(dma_pdev->dev.iommu_group);
> +		return ret;

It might be nice to have generic helpers that do this kind of lifetime
handling of the groups, but that's a detail.

>  	}
>  
>  	if (pci_iommuv2_capable(pdev)) {
> @@ -309,6 +330,12 @@ static void iommu_ignore_device(struct device *dev)
>  
>  static void iommu_uninit_device(struct device *dev)
>  {
> +	struct iommu_group *group = dev->iommu_group;
> +
> +	iommu_group_remove_device(dev);
> +	if (iommu_group_empty(group))
> +		iommu_group_free(group);
> +
>  	/*
>  	 * Nothing to do here - we keep dev_data around for unplugged devices
>  	 * and reuse it when the device is re-plugged - not doing so would
> @@ -3191,26 +3218,6 @@ static int amd_iommu_domain_has_cap(struct iommu_domain *domain,
>  	return 0;
>  }
>  
> -static int amd_iommu_device_group(struct device *dev, unsigned int *groupid)
> -{
> -	struct iommu_dev_data *dev_data = dev->archdata.iommu;
> -	struct pci_dev *pdev = to_pci_dev(dev);
> -	u16 devid;
> -
> -	if (!dev_data)
> -		return -ENODEV;
> -
> -	if (pdev->is_virtfn || !iommu_group_mf)
> -		devid = dev_data->devid;
> -	else
> -		devid = calc_devid(pdev->bus->number,
> -				   PCI_DEVFN(PCI_SLOT(pdev->devfn), 0));
> -
> -	*groupid = amd_iommu_alias_table[devid];
> -
> -	return 0;
> -}
> -
>  static struct iommu_ops amd_iommu_ops = {
>  	.domain_init = amd_iommu_domain_init,
>  	.domain_destroy = amd_iommu_domain_destroy,
> @@ -3220,7 +3227,6 @@ static struct iommu_ops amd_iommu_ops = {
>  	.unmap = amd_iommu_unmap,
>  	.iova_to_phys = amd_iommu_iova_to_phys,
>  	.domain_has_cap = amd_iommu_domain_has_cap,
> -	.device_group = amd_iommu_device_group,
>  	.pgsize_bitmap	= AMD_IOMMU_PGSIZES,
>  };
>  
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index c9c6053..41ab7d0 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -4075,54 +4075,59 @@ static int intel_iommu_domain_has_cap(struct iommu_domain *domain,
>  	return 0;
>  }
>  
> -/*
> - * Group numbers are arbitrary.  Device with the same group number
> - * indicate the iommu cannot differentiate between them.  To avoid
> - * tracking used groups we just use the seg|bus|devfn of the lowest
> - * level we're able to differentiate devices
> - */
> -static int intel_iommu_device_group(struct device *dev, unsigned int *groupid)
> +static int intel_iommu_add_device(struct device *dev)
>  {
>  	struct pci_dev *pdev = to_pci_dev(dev);
> -	struct pci_dev *bridge;
> -	union {
> -		struct {
> -			u8 devfn;
> -			u8 bus;
> -			u16 segment;
> -		} pci;
> -		u32 group;
> -	} id;
> -
> -	if (iommu_no_mapping(dev))
> -		return -ENODEV;
> -
> -	id.pci.segment = pci_domain_nr(pdev->bus);
> -	id.pci.bus = pdev->bus->number;
> -	id.pci.devfn = pdev->devfn;
> +	struct pci_dev *bridge, *dma_pdev = pdev;
> +	int ret;
>  
> -	if (!device_to_iommu(id.pci.segment, id.pci.bus, id.pci.devfn))
> +	if (!device_to_iommu(pci_domain_nr(pdev->bus),
> +			     pdev->bus->number, pdev->devfn))
>  		return -ENODEV;
>  
>  	bridge = pci_find_upstream_pcie_bridge(pdev);
>  	if (bridge) {
> -		if (pci_is_pcie(bridge)) {
> -			id.pci.bus = bridge->subordinate->number;
> -			id.pci.devfn = 0;
> -		} else {
> -			id.pci.bus = bridge->bus->number;
> -			id.pci.devfn = bridge->devfn;
> -		}
> +		if (pci_is_pcie(bridge))
> +			dma_pdev = pci_get_domain_bus_and_slot(
> +						pci_domain_nr(pdev->bus),
> +						bridge->subordinate->number, 0);
> +		else
> +			dma_pdev = bridge;
> +	}
> +
> +	if (!bridge && !pdev->is_virtfn && iommu_group_mf)
> +		dma_pdev = pci_get_slot(dma_pdev->bus, 0);
> +
> +	/* dma_pdev = iommu_pci_quirk(dma_pdev); */
> +
> +	if (!dma_pdev->dev.iommu_group) {
> +		struct iommu_group *group;
> +
> +		group = iommu_group_alloc(&dma_pdev->dev);
> +		if (IS_ERR(group))
> +			return PTR_ERR(group);
>  	}
>  
> -	if (!pdev->is_virtfn && iommu_group_mf)
> -		id.pci.devfn = PCI_DEVFN(PCI_SLOT(id.pci.devfn), 0);
> +	ret = iommu_group_add_device(dma_pdev->dev.iommu_group, dev);
> +	if (ret) {
> +		if (iommu_group_empty(dma_pdev->dev.iommu_group))
> +			iommu_group_free(dma_pdev->dev.iommu_group);
>  
> -	*groupid = id.group;
> +		return ret;
> +	}
>  
>  	return 0;
>  }
>  
> +static void intel_iommu_remove_device(struct device *dev)
> +{
> +	struct iommu_group *group = dev->iommu_group;
> +
> +	iommu_group_remove_device(dev);
> +	if (iommu_group_empty(group))
> +		iommu_group_free(group);
> +}
> +
>  static struct iommu_ops intel_iommu_ops = {
>  	.domain_init	= intel_iommu_domain_init,
>  	.domain_destroy = intel_iommu_domain_destroy,
> @@ -4132,7 +4137,8 @@ static struct iommu_ops intel_iommu_ops = {
>  	.unmap		= intel_iommu_unmap,
>  	.iova_to_phys	= intel_iommu_iova_to_phys,
>  	.domain_has_cap = intel_iommu_domain_has_cap,
> -	.device_group	= intel_iommu_device_group,
> +	.add_device	= intel_iommu_add_device,
> +	.remove_device	= intel_iommu_remove_device,
>  	.pgsize_bitmap	= INTEL_IOMMU_PGSIZES,
>  };
>  
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 2198b2d..6a51ed9 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -26,48 +26,96 @@
>  #include <linux/slab.h>
>  #include <linux/errno.h>
>  #include <linux/iommu.h>
> +#include <linux/idr.h>
>  
> -static ssize_t show_iommu_group(struct device *dev,
> -				struct device_attribute *attr, char *buf)
> +static struct kset *iommu_group_kset;
> +static struct idr iommu_group_idr;
> +static struct mutex iommu_group_mutex;
> +
> +struct iommu_group_attribute {
> +	struct attribute attr;
> +	ssize_t (*show)(struct iommu_group *group, char *buf);
> +	ssize_t (*store)(struct iommu_group *group,
> +			 const char *buf, size_t count);
> +};
> +
> +#define to_iommu_group_attr(_attr)	\
> +	container_of(_attr, struct iommu_group_attribute, attr)
> +#define to_iommu_group(_kobj)		\
> +	container_of(_kobj, struct iommu_group, kobj)
> +
> +static ssize_t iommu_group_attr_show(struct kobject *kobj,
> +				     struct attribute *__attr, char *buf)
>  {
> -	unsigned int groupid;
> +	struct iommu_group_attribute *attr = to_iommu_group_attr(__attr);
> +	struct iommu_group *group = to_iommu_group(kobj);
> +	ssize_t ret = -EIO;
>  
> -	if (iommu_device_group(dev, &groupid))
> -		return 0;
> +	if (attr->show)
> +		ret = attr->show(group, buf);
> +	return ret;
> +}
> +
> +static ssize_t iommu_group_attr_store(struct kobject *kobj,
> +				      struct attribute *__attr,
> +				      const char *buf, size_t count)
> +{
> +	struct iommu_group_attribute *attr = to_iommu_group_attr(__attr);
> +	struct iommu_group *group = to_iommu_group(kobj);
> +	ssize_t ret = -EIO;
>  
> -	return sprintf(buf, "%u", groupid);
> +	if (attr->store)
> +		ret = attr->store(group, buf, count);
> +	return ret;
>  }
> -static DEVICE_ATTR(iommu_group, S_IRUGO, show_iommu_group, NULL);
>  
> -static int add_iommu_group(struct device *dev, void *data)
> +static void iommu_group_release(struct kobject *kobj)
>  {
> -	unsigned int groupid;
> +	struct iommu_group *group = to_iommu_group(kobj);
>  
> -	if (iommu_device_group(dev, &groupid) == 0)
> -		return device_create_file(dev, &dev_attr_iommu_group);
> +	mutex_lock(&iommu_group_mutex);
> +	idr_remove(&iommu_group_idr, group->id);
> +	mutex_unlock(&iommu_group_mutex);
>  
> -	return 0;
> +	kfree(group);
>  }
>  
> -static int remove_iommu_group(struct device *dev)
> +static const struct sysfs_ops iommu_group_sysfs_ops = {
> +	.show = iommu_group_attr_show,
> +	.store = iommu_group_attr_store,
> +};
> +
> +static struct kobj_type iommu_group_ktype = {
> +	.sysfs_ops = &iommu_group_sysfs_ops,
> +	.release = iommu_group_release,
> +};
> +
> +static int add_iommu_group(struct device *dev, void *data)
>  {
> -	unsigned int groupid;
> +	struct iommu_ops *ops = data;
>  
> -	if (iommu_device_group(dev, &groupid) == 0)
> -		device_remove_file(dev, &dev_attr_iommu_group);
> +	if (!ops->add_device)
> +		return -ENODEV;
>  
> -	return 0;
> +	WARN_ON(dev->iommu_group);
> +
> +	return ops->add_device(dev);
>  }
>  
>  static int iommu_device_notifier(struct notifier_block *nb,
>  				 unsigned long action, void *data)
>  {
>  	struct device *dev = data;
> +	struct iommu_ops *ops = dev->bus->iommu_ops;
>  
>  	if (action == BUS_NOTIFY_ADD_DEVICE)
> -		return add_iommu_group(dev, NULL);
> -	else if (action == BUS_NOTIFY_DEL_DEVICE)
> -		return remove_iommu_group(dev);
> +		return add_iommu_group(dev, ops);
> +	else if (action == BUS_NOTIFY_DEL_DEVICE) {
> +		if (ops->remove_device && dev->iommu_group) {
> +			ops->remove_device(dev);
> +			return 0;
> +		}
> +	}
>  
>  	return 0;
>  }
> @@ -79,7 +127,97 @@ static struct notifier_block iommu_device_nb = {
>  static void iommu_bus_init(struct bus_type *bus, struct iommu_ops *ops)
>  {
>  	bus_register_notifier(bus, &iommu_device_nb);
> -	bus_for_each_dev(bus, NULL, NULL, add_iommu_group);
> +	bus_for_each_dev(bus, NULL, ops, add_iommu_group);
> +}
> +
> +struct iommu_group *iommu_group_alloc(struct device *dma_dev)
> +{
> +	struct iommu_group *group;
> +	int ret;
> +
> +	group = kzalloc(sizeof(*group), GFP_KERNEL);
> +	if (!group)
> +		return ERR_PTR(-ENOMEM);
> +
> +	group->kobj.kset = iommu_group_kset;
> +	group->dma_dev = dma_dev;
> +	atomic_set(&group->refcnt, 0);
> +
> +	mutex_lock(&iommu_group_mutex);
> +
> +again:
> +	if (unlikely(0 == idr_pre_get(&iommu_group_idr, GFP_KERNEL))) {
> +		kfree(group);
> +		mutex_unlock(&iommu_group_mutex);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	if (-EAGAIN == idr_get_new(&iommu_group_idr, group, &group->id))
> +		goto again;
> +
> +	mutex_unlock(&iommu_group_mutex);
> +
> +	ret = kobject_init_and_add(&group->kobj, &iommu_group_ktype,
> +				   NULL, "%d", group->id);
> +	if (ret) {
> +		iommu_group_release(&group->kobj);
> +		return ERR_PTR(ret);
> +	}
> +
> +	group->devices_kobj = kobject_create_and_add("devices", &group->kobj);
> +        if (!group->devices_kobj) {
> +		kobject_put(&group->kobj);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	dma_dev->iommu_group = group;
> +
> +	return group;
> +}
> +
> +void iommu_group_free(struct iommu_group *group)
> +{
> +	group->dma_dev->iommu_group = NULL;
> +	WARN_ON(atomic_read(&group->refcnt));

This kind of implies that the representative device doesn't count
against the refcount, which seems wrong.

> +	kobject_put(group->devices_kobj);
> +	kobject_put(&group->kobj);
> +}
> +
> +bool iommu_group_empty(struct iommu_group *group)
> +{
> +	return (0 == atomic_read(&group->refcnt));
> +}
> +
> +int iommu_group_add_device(struct iommu_group *group, struct device *dev)
> +{
> +	int ret;
> +
> +	atomic_inc(&group->refcnt);
> +
> +	ret = sysfs_create_link(&dev->kobj, &group->kobj, "iommu_group");
> +	if (ret)
> +		return ret;
> +
> +	ret = sysfs_create_link(group->devices_kobj, &dev->kobj,
> +                                kobject_name(&dev->kobj));
> +	if (ret) {
> +		sysfs_remove_link(&dev->kobj, "iommu_group");
> +		return ret;
> +	}
> +
> +	dev->iommu_group = group;
> +
> +	return 0;
> +}
> +
> +void iommu_group_remove_device(struct device *dev)
> +{
> +	sysfs_remove_link(dev->iommu_group->devices_kobj,
> +			  kobject_name(&dev->kobj));
> +	sysfs_remove_link(&dev->kobj, "iommu_group");
> +
> +	atomic_dec(&dev->iommu_group->refcnt);
> +	dev->iommu_group = NULL;
>  }
>  
>  /**
> @@ -335,9 +473,23 @@ EXPORT_SYMBOL_GPL(iommu_unmap);
>  
>  int iommu_device_group(struct device *dev, unsigned int *groupid)
>  {
> -	if (iommu_present(dev->bus) && dev->bus->iommu_ops->device_group)
> -		return dev->bus->iommu_ops->device_group(dev, groupid);
> +	if (dev->iommu_group) {
> +		*groupid = dev->iommu_group->id;
> +		return 0;
> +	}
>  
>  	return -ENODEV;
>  }
>  EXPORT_SYMBOL_GPL(iommu_device_group);
> +
> +static int __init iommu_init(void)
> +{
> +	iommu_group_kset = kset_create_and_add("iommu_groups", NULL, NULL);
> +	idr_init(&iommu_group_idr);
> +	mutex_init(&iommu_group_mutex);
> +
> +	BUG_ON(!iommu_group_kset);
> +
> +	return 0;
> +}
> +subsys_initcall(iommu_init);
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 2ee375c..24004d6 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -60,6 +60,8 @@ struct iommu_domain {
>   * @iova_to_phys: translate iova to physical address
>   * @domain_has_cap: domain capabilities query
>   * @commit: commit iommu domain
> + * @add_device: add device to iommu grouping
> + * @remove_device: remove device from iommu grouping
>   * @pgsize_bitmap: bitmap of supported page sizes
>   */
>  struct iommu_ops {
> @@ -76,10 +78,25 @@ struct iommu_ops {
>  	int (*domain_has_cap)(struct iommu_domain *domain,
>  			      unsigned long cap);
>  	int (*device_group)(struct device *dev, unsigned int *groupid);
> +	int (*add_device)(struct device *dev);
> +	void (*remove_device)(struct device *dev);
>  	unsigned long pgsize_bitmap;
>  };
>  
> +/**
> + * struct iommu_group - groups of devices representing iommu visibility
> + * @dma_dev: all dma from the group appears to the iommu using this source id
> + * @kobj: iommu group node in sysfs
> + * @devices_kobj: sysfs subdir node for linking devices
> + * @refcnt: number of devices in group
> + * @id: unique id number for the group (for easy sysfs listing)
> + */
>  struct iommu_group {
> +	struct device *dma_dev;

So, a "representative" device works very nicely for the AMD and Intel
IOMMUs.  But I'm not sure it makes sense for any IOMMU - we could
point it to the bridge in the Power case, but I'm not sure if that
really makes sense (particularly when it's a host bridge, not a p2p
bridge).  In embedded cases it may make even less sense.

I think it would be better to move this into iommu driver specific
data (which I guess would mean adding a facility for iommu driver
specific data, either with a private pointer or using container_of).

> +	struct kobject kobj;
> +	struct kobject *devices_kobj;

The devices_kobj has the same lifetime as the group, so it might as
well be a static member.

> +	atomic_t refcnt;
> +	int id;
>  };
>  
>  extern int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops);
> @@ -101,6 +118,12 @@ extern int iommu_domain_has_cap(struct iommu_domain *domain,
>  extern void iommu_set_fault_handler(struct iommu_domain *domain,
>  					iommu_fault_handler_t handler);
>  extern int iommu_device_group(struct device *dev, unsigned int *groupid);
> +extern struct iommu_group *iommu_group_alloc(struct device *dev);
> +extern void iommu_group_free(struct iommu_group *group);
> +extern bool iommu_group_empty(struct iommu_group *group);
> +extern int iommu_group_add_device(struct iommu_group *group,
> +				  struct device *dev);
> +extern void iommu_group_remove_device(struct device *dev);
>  
>  /**
>   * report_iommu_fault() - report about an IOMMU fault to the IOMMU framework
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 1/3] iommu: Introduce iommu_group
@ 2012-04-18 20:07       ` Alex Williamson
  0 siblings, 0 replies; 21+ messages in thread
From: Alex Williamson @ 2012-04-18 20:07 UTC (permalink / raw)
  To: David Gibson; +Cc: dwmw2, iommu, qemu-devel, joerg.roedel, kvm, linux-kernel

On Wed, 2012-04-18 at 19:58 +1000, David Gibson wrote:
> On Mon, Apr 02, 2012 at 03:14:40PM -0600, Alex Williamson wrote:
> > IOMMUs often do not have visibility of individual devices in the
> > system.  Due to IOMMU design, bus topology, or device quirks, we
> > can often only identify groups of devices.  Examples include
> > Intel VT-d & AMD-Vi which often have function level visibility
> > compared to POWER partitionable endpoints which have bridge level
> > granularity.
> 
> That's a significant oversimplification of the situation on POWER,
> although it doesn't really matter in this context.  On older (i.e. pre
> PCI-E) hardware, PEs have either host bridge (i.e. domain)
> granularity, or in IIUC in some cases p2p bridge granularity, using
> special p2p bridges, since that's the only real way to do iommu
> differentiation without the PCI-E requestor IDs.  This isn't as coarse
> as it seems in practice, because the hardware is usually built with a
> bridge per physical PCI slot.
> 
> On newer PCI-E hardware, the PE granularity is basically a firmware
> decision, and can go down to function level.  I believe pHyp puts the
> granularity at the bridge level.  Our non-virtualized Linux "firmware"
> currently does put it at the function level, but Ben is thinking about
> changing that to bridge level: again, because of the hardware design
> that isn't as coarse as it seems, and at this level we can hardware
> guarantee isolation to a degree that's not possible at the function
> level.

Ok, thanks for the clarification.  This should support either model and
it will be up to the iommu driver to fill the groups with the right
devices.

> >  PCIe-to-PCI bridges also often cloud the IOMMU
> > visibility as it cannot distiguish devices behind the bridge.
> > Devices can also sometimes hurt themselves by initiating DMA using
> > the wrong source ID on a multifunction PCI device.
> > 
> > IOMMU groups are meant to help solve these problems and hopefully
> > become the working unit of the IOMMI API.
> 
> So far, so simple.  No objections here.  I am trying to work out what
> the real difference in approach is in this seriers from either your or
> my earlier isolation group series.  AFAICT it's just that this
> approach is explicitly only about IOMMU identity, ignoring (here) any
> other factors which might affect isolation.  Or am I missing
> something?

Yes, they are very similar and actually also similar to how VFIO manages
groups.  It's easy to start some kind of group structure, the hard part
is in the details and particularly where to stop.  My attempt to figure
out where isolation groups stop went quite poorly, ending up with a
ridiculously complicated mess of hierarchical groups that got worse as I
tried to fill in the gaps.

With iommu groups I try to take a step back and simplify.  I initially
had a goal of describing only the minimum iommu granularity sets, this
is where the dma_dev idea came from.  But the iommu granularity doesn't
really guarantee all that much or allow any ability for the iommu driver
to add additional policies that would be useful for userspace drivers
(ex. multi-function devices and peer-to-peer isolation).  So again I've
had to allow that a group might have multiple visible requestor IDs
within the group.  This time though I'm trying to disallow hierarchies,
which means that even kernel, dma_ops, usage of groups are restricted to
a single, platform defined, level of isolation.  I'm also trying to stay
out of the business of providing a group management interface.  I only
want to describe groups.  Things like stopping driver probes should be
device level problems.  In effect, this level should not be providing
enforcement and ownership, something like VFIO will do that.  So the
differences are subtle, but important.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 1/3] iommu: Introduce iommu_group
@ 2012-04-18 20:07       ` Alex Williamson
  0 siblings, 0 replies; 21+ messages in thread
From: Alex Williamson @ 2012-04-18 20:07 UTC (permalink / raw)
  To: David Gibson
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ

On Wed, 2012-04-18 at 19:58 +1000, David Gibson wrote:
> On Mon, Apr 02, 2012 at 03:14:40PM -0600, Alex Williamson wrote:
> > IOMMUs often do not have visibility of individual devices in the
> > system.  Due to IOMMU design, bus topology, or device quirks, we
> > can often only identify groups of devices.  Examples include
> > Intel VT-d & AMD-Vi which often have function level visibility
> > compared to POWER partitionable endpoints which have bridge level
> > granularity.
> 
> That's a significant oversimplification of the situation on POWER,
> although it doesn't really matter in this context.  On older (i.e. pre
> PCI-E) hardware, PEs have either host bridge (i.e. domain)
> granularity, or in IIUC in some cases p2p bridge granularity, using
> special p2p bridges, since that's the only real way to do iommu
> differentiation without the PCI-E requestor IDs.  This isn't as coarse
> as it seems in practice, because the hardware is usually built with a
> bridge per physical PCI slot.
> 
> On newer PCI-E hardware, the PE granularity is basically a firmware
> decision, and can go down to function level.  I believe pHyp puts the
> granularity at the bridge level.  Our non-virtualized Linux "firmware"
> currently does put it at the function level, but Ben is thinking about
> changing that to bridge level: again, because of the hardware design
> that isn't as coarse as it seems, and at this level we can hardware
> guarantee isolation to a degree that's not possible at the function
> level.

Ok, thanks for the clarification.  This should support either model and
it will be up to the iommu driver to fill the groups with the right
devices.

> >  PCIe-to-PCI bridges also often cloud the IOMMU
> > visibility as it cannot distiguish devices behind the bridge.
> > Devices can also sometimes hurt themselves by initiating DMA using
> > the wrong source ID on a multifunction PCI device.
> > 
> > IOMMU groups are meant to help solve these problems and hopefully
> > become the working unit of the IOMMI API.
> 
> So far, so simple.  No objections here.  I am trying to work out what
> the real difference in approach is in this seriers from either your or
> my earlier isolation group series.  AFAICT it's just that this
> approach is explicitly only about IOMMU identity, ignoring (here) any
> other factors which might affect isolation.  Or am I missing
> something?

Yes, they are very similar and actually also similar to how VFIO manages
groups.  It's easy to start some kind of group structure, the hard part
is in the details and particularly where to stop.  My attempt to figure
out where isolation groups stop went quite poorly, ending up with a
ridiculously complicated mess of hierarchical groups that got worse as I
tried to fill in the gaps.

With iommu groups I try to take a step back and simplify.  I initially
had a goal of describing only the minimum iommu granularity sets, this
is where the dma_dev idea came from.  But the iommu granularity doesn't
really guarantee all that much or allow any ability for the iommu driver
to add additional policies that would be useful for userspace drivers
(ex. multi-function devices and peer-to-peer isolation).  So again I've
had to allow that a group might have multiple visible requestor IDs
within the group.  This time though I'm trying to disallow hierarchies,
which means that even kernel, dma_ops, usage of groups are restricted to
a single, platform defined, level of isolation.  I'm also trying to stay
out of the business of providing a group management interface.  I only
want to describe groups.  Things like stopping driver probes should be
device level problems.  In effect, this level should not be providing
enforcement and ownership, something like VFIO will do that.  So the
differences are subtle, but important.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 2/3] iommu: Create basic group infrastructure and update AMD-Vi & Intel VT-d
@ 2012-04-18 20:28       ` Alex Williamson
  0 siblings, 0 replies; 21+ messages in thread
From: Alex Williamson @ 2012-04-18 20:28 UTC (permalink / raw)
  To: David Gibson; +Cc: dwmw2, iommu, qemu-devel, joerg.roedel, kvm, linux-kernel

On Wed, 2012-04-18 at 21:55 +1000, David Gibson wrote:
> On Mon, Apr 02, 2012 at 03:14:46PM -0600, Alex Williamson wrote:
> > IOMMU groups define the minimum granularity of the IOMMU.  We therefore
> > create groups using a dma_dev which is the effective requestor ID for
> > the entire group.  Additional devices can be added to groups based on
> > system topology, IOMMU limitations, or device quirks.
> > 
> > This implementation also includes a simple idr database for creating
> > a flat address space of group IDs across multiple IOMMUs.  Updates
> > included for Intel VT-d, using example iommu callbacks for adding and
> > removing devices, as well as AMD-Vi, which tracks devices on it's own.
> > We should be able to better integrate the iommu_group within existing
> > AMD-Vi structs or provide a private data location within the iommu_group
> > where we can eventually reduce redundancy.
> > 
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> 
> Looks reasonable as far as it goes.  This still lacks an obvious means
> for doing group assignment of devices on busses subordinate to devices
> that are on iommu managed busses.  Which as we discussed earlier is a
> bit of a can of worms, but necessary.

Yes, this was a big can of worms in my isolation group attempt.  I'm
hoping it's easier here.  Perhaps the iommu core needs to register add
device notifiers for every bus and for each new device without a direct
iommu, walk up the device tree and add it as a hidden device in the
first iommu group it finds.  Nested iommus also throw a serious wrench
into the problem, but as we've discussed, are probably safe to ignore.

> > ---
> > 
> >  drivers/iommu/amd_iommu.c   |   50 ++++++-----
> >  drivers/iommu/intel-iommu.c |   76 +++++++++--------
> >  drivers/iommu/iommu.c       |  198 ++++++++++++++++++++++++++++++++++++++-----
> >  include/linux/iommu.h       |   23 +++++
> >  4 files changed, 267 insertions(+), 80 deletions(-)
> > 
> > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> > index f75e060..876db28 100644
> > --- a/drivers/iommu/amd_iommu.c
> > +++ b/drivers/iommu/amd_iommu.c
> > @@ -256,9 +256,10 @@ static bool check_device(struct device *dev)
> >  
> >  static int iommu_init_device(struct device *dev)
> >  {
> > -	struct pci_dev *pdev = to_pci_dev(dev);
> > +	struct pci_dev *dma_pdev, *pdev = to_pci_dev(dev);
> >  	struct iommu_dev_data *dev_data;
> >  	u16 alias;
> > +	int ret;
> >  
> >  	if (dev->archdata.iommu)
> >  		return 0;
> > @@ -279,6 +280,26 @@ static int iommu_init_device(struct device *dev)
> >  			return -ENOTSUPP;
> >  		}
> >  		dev_data->alias_data = alias_data;
> > +
> > +		dma_pdev = pci_get_bus_and_slot(alias >> 8, alias & 0xff);
> > +	} else
> > +		dma_pdev = pdev;
> > +
> > +	/* dma_pdev = iommu_pci_quirk(dma_pdev) */
> 
> Presumably an actual implementation of the quirk is coming?  It might

Yeah, I can add a simple one.  It may also be a good place to implement
any kind of global policy, like grouping multifunction devices or
enforcing PCI ACS.  Once we have that, I hope we could just add some
kind of get/set iommudata to the group that might allow dma_ops to start
making use of groups.

> be an idea for it to take both the individual and representative
> devices, in case that information is useful.

Ok.

> > +	if (!dma_pdev->dev.iommu_group) {
> > +		struct iommu_group *group;
> > +
> > +		group = iommu_group_alloc(&dma_pdev->dev);
> > +		if (IS_ERR(group))
> > +			return PTR_ERR(group);
> > +	}
> > +
> > +	ret = iommu_group_add_device(dma_pdev->dev.iommu_group, dev);
> > +	if (ret) {
> > +		if (iommu_group_empty(dma_pdev->dev.iommu_group))
> > +			iommu_group_free(dma_pdev->dev.iommu_group);
> > +		return ret;
> 
> It might be nice to have generic helpers that do this kind of lifetime
> handling of the groups, but that's a detail.

I've cleaned this up a lot in my working version, using kobject
references makes this much better.

> >  	}
> >  
> >  	if (pci_iommuv2_capable(pdev)) {
> > @@ -309,6 +330,12 @@ static void iommu_ignore_device(struct device *dev)
> >  
> >  static void iommu_uninit_device(struct device *dev)
> >  {
> > +	struct iommu_group *group = dev->iommu_group;
> > +
> > +	iommu_group_remove_device(dev);
> > +	if (iommu_group_empty(group))
> > +		iommu_group_free(group);
> > +
> >  	/*
> >  	 * Nothing to do here - we keep dev_data around for unplugged devices
> >  	 * and reuse it when the device is re-plugged - not doing so would
> > @@ -3191,26 +3218,6 @@ static int amd_iommu_domain_has_cap(struct iommu_domain *domain,
> >  	return 0;
> >  }
> >  
> > -static int amd_iommu_device_group(struct device *dev, unsigned int *groupid)
> > -{
> > -	struct iommu_dev_data *dev_data = dev->archdata.iommu;
> > -	struct pci_dev *pdev = to_pci_dev(dev);
> > -	u16 devid;
> > -
> > -	if (!dev_data)
> > -		return -ENODEV;
> > -
> > -	if (pdev->is_virtfn || !iommu_group_mf)
> > -		devid = dev_data->devid;
> > -	else
> > -		devid = calc_devid(pdev->bus->number,
> > -				   PCI_DEVFN(PCI_SLOT(pdev->devfn), 0));
> > -
> > -	*groupid = amd_iommu_alias_table[devid];
> > -
> > -	return 0;
> > -}
> > -
> >  static struct iommu_ops amd_iommu_ops = {
> >  	.domain_init = amd_iommu_domain_init,
> >  	.domain_destroy = amd_iommu_domain_destroy,
> > @@ -3220,7 +3227,6 @@ static struct iommu_ops amd_iommu_ops = {
> >  	.unmap = amd_iommu_unmap,
> >  	.iova_to_phys = amd_iommu_iova_to_phys,
> >  	.domain_has_cap = amd_iommu_domain_has_cap,
> > -	.device_group = amd_iommu_device_group,
> >  	.pgsize_bitmap	= AMD_IOMMU_PGSIZES,
> >  };
> >  
> > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> > index c9c6053..41ab7d0 100644
> > --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -4075,54 +4075,59 @@ static int intel_iommu_domain_has_cap(struct iommu_domain *domain,
> >  	return 0;
> >  }
> >  
> > -/*
> > - * Group numbers are arbitrary.  Device with the same group number
> > - * indicate the iommu cannot differentiate between them.  To avoid
> > - * tracking used groups we just use the seg|bus|devfn of the lowest
> > - * level we're able to differentiate devices
> > - */
> > -static int intel_iommu_device_group(struct device *dev, unsigned int *groupid)
> > +static int intel_iommu_add_device(struct device *dev)
> >  {
> >  	struct pci_dev *pdev = to_pci_dev(dev);
> > -	struct pci_dev *bridge;
> > -	union {
> > -		struct {
> > -			u8 devfn;
> > -			u8 bus;
> > -			u16 segment;
> > -		} pci;
> > -		u32 group;
> > -	} id;
> > -
> > -	if (iommu_no_mapping(dev))
> > -		return -ENODEV;
> > -
> > -	id.pci.segment = pci_domain_nr(pdev->bus);
> > -	id.pci.bus = pdev->bus->number;
> > -	id.pci.devfn = pdev->devfn;
> > +	struct pci_dev *bridge, *dma_pdev = pdev;
> > +	int ret;
> >  
> > -	if (!device_to_iommu(id.pci.segment, id.pci.bus, id.pci.devfn))
> > +	if (!device_to_iommu(pci_domain_nr(pdev->bus),
> > +			     pdev->bus->number, pdev->devfn))
> >  		return -ENODEV;
> >  
> >  	bridge = pci_find_upstream_pcie_bridge(pdev);
> >  	if (bridge) {
> > -		if (pci_is_pcie(bridge)) {
> > -			id.pci.bus = bridge->subordinate->number;
> > -			id.pci.devfn = 0;
> > -		} else {
> > -			id.pci.bus = bridge->bus->number;
> > -			id.pci.devfn = bridge->devfn;
> > -		}
> > +		if (pci_is_pcie(bridge))
> > +			dma_pdev = pci_get_domain_bus_and_slot(
> > +						pci_domain_nr(pdev->bus),
> > +						bridge->subordinate->number, 0);
> > +		else
> > +			dma_pdev = bridge;
> > +	}
> > +
> > +	if (!bridge && !pdev->is_virtfn && iommu_group_mf)
> > +		dma_pdev = pci_get_slot(dma_pdev->bus, 0);
> > +
> > +	/* dma_pdev = iommu_pci_quirk(dma_pdev); */
> > +
> > +	if (!dma_pdev->dev.iommu_group) {
> > +		struct iommu_group *group;
> > +
> > +		group = iommu_group_alloc(&dma_pdev->dev);
> > +		if (IS_ERR(group))
> > +			return PTR_ERR(group);
> >  	}
> >  
> > -	if (!pdev->is_virtfn && iommu_group_mf)
> > -		id.pci.devfn = PCI_DEVFN(PCI_SLOT(id.pci.devfn), 0);
> > +	ret = iommu_group_add_device(dma_pdev->dev.iommu_group, dev);
> > +	if (ret) {
> > +		if (iommu_group_empty(dma_pdev->dev.iommu_group))
> > +			iommu_group_free(dma_pdev->dev.iommu_group);
> >  
> > -	*groupid = id.group;
> > +		return ret;
> > +	}
> >  
> >  	return 0;
> >  }
> >  
> > +static void intel_iommu_remove_device(struct device *dev)
> > +{
> > +	struct iommu_group *group = dev->iommu_group;
> > +
> > +	iommu_group_remove_device(dev);
> > +	if (iommu_group_empty(group))
> > +		iommu_group_free(group);
> > +}
> > +
> >  static struct iommu_ops intel_iommu_ops = {
> >  	.domain_init	= intel_iommu_domain_init,
> >  	.domain_destroy = intel_iommu_domain_destroy,
> > @@ -4132,7 +4137,8 @@ static struct iommu_ops intel_iommu_ops = {
> >  	.unmap		= intel_iommu_unmap,
> >  	.iova_to_phys	= intel_iommu_iova_to_phys,
> >  	.domain_has_cap = intel_iommu_domain_has_cap,
> > -	.device_group	= intel_iommu_device_group,
> > +	.add_device	= intel_iommu_add_device,
> > +	.remove_device	= intel_iommu_remove_device,
> >  	.pgsize_bitmap	= INTEL_IOMMU_PGSIZES,
> >  };
> >  
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index 2198b2d..6a51ed9 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -26,48 +26,96 @@
> >  #include <linux/slab.h>
> >  #include <linux/errno.h>
> >  #include <linux/iommu.h>
> > +#include <linux/idr.h>
> >  
> > -static ssize_t show_iommu_group(struct device *dev,
> > -				struct device_attribute *attr, char *buf)
> > +static struct kset *iommu_group_kset;
> > +static struct idr iommu_group_idr;
> > +static struct mutex iommu_group_mutex;
> > +
> > +struct iommu_group_attribute {
> > +	struct attribute attr;
> > +	ssize_t (*show)(struct iommu_group *group, char *buf);
> > +	ssize_t (*store)(struct iommu_group *group,
> > +			 const char *buf, size_t count);
> > +};
> > +
> > +#define to_iommu_group_attr(_attr)	\
> > +	container_of(_attr, struct iommu_group_attribute, attr)
> > +#define to_iommu_group(_kobj)		\
> > +	container_of(_kobj, struct iommu_group, kobj)
> > +
> > +static ssize_t iommu_group_attr_show(struct kobject *kobj,
> > +				     struct attribute *__attr, char *buf)
> >  {
> > -	unsigned int groupid;
> > +	struct iommu_group_attribute *attr = to_iommu_group_attr(__attr);
> > +	struct iommu_group *group = to_iommu_group(kobj);
> > +	ssize_t ret = -EIO;
> >  
> > -	if (iommu_device_group(dev, &groupid))
> > -		return 0;
> > +	if (attr->show)
> > +		ret = attr->show(group, buf);
> > +	return ret;
> > +}
> > +
> > +static ssize_t iommu_group_attr_store(struct kobject *kobj,
> > +				      struct attribute *__attr,
> > +				      const char *buf, size_t count)
> > +{
> > +	struct iommu_group_attribute *attr = to_iommu_group_attr(__attr);
> > +	struct iommu_group *group = to_iommu_group(kobj);
> > +	ssize_t ret = -EIO;
> >  
> > -	return sprintf(buf, "%u", groupid);
> > +	if (attr->store)
> > +		ret = attr->store(group, buf, count);
> > +	return ret;
> >  }
> > -static DEVICE_ATTR(iommu_group, S_IRUGO, show_iommu_group, NULL);
> >  
> > -static int add_iommu_group(struct device *dev, void *data)
> > +static void iommu_group_release(struct kobject *kobj)
> >  {
> > -	unsigned int groupid;
> > +	struct iommu_group *group = to_iommu_group(kobj);
> >  
> > -	if (iommu_device_group(dev, &groupid) == 0)
> > -		return device_create_file(dev, &dev_attr_iommu_group);
> > +	mutex_lock(&iommu_group_mutex);
> > +	idr_remove(&iommu_group_idr, group->id);
> > +	mutex_unlock(&iommu_group_mutex);
> >  
> > -	return 0;
> > +	kfree(group);
> >  }
> >  
> > -static int remove_iommu_group(struct device *dev)
> > +static const struct sysfs_ops iommu_group_sysfs_ops = {
> > +	.show = iommu_group_attr_show,
> > +	.store = iommu_group_attr_store,
> > +};
> > +
> > +static struct kobj_type iommu_group_ktype = {
> > +	.sysfs_ops = &iommu_group_sysfs_ops,
> > +	.release = iommu_group_release,
> > +};
> > +
> > +static int add_iommu_group(struct device *dev, void *data)
> >  {
> > -	unsigned int groupid;
> > +	struct iommu_ops *ops = data;
> >  
> > -	if (iommu_device_group(dev, &groupid) == 0)
> > -		device_remove_file(dev, &dev_attr_iommu_group);
> > +	if (!ops->add_device)
> > +		return -ENODEV;
> >  
> > -	return 0;
> > +	WARN_ON(dev->iommu_group);
> > +
> > +	return ops->add_device(dev);
> >  }
> >  
> >  static int iommu_device_notifier(struct notifier_block *nb,
> >  				 unsigned long action, void *data)
> >  {
> >  	struct device *dev = data;
> > +	struct iommu_ops *ops = dev->bus->iommu_ops;
> >  
> >  	if (action == BUS_NOTIFY_ADD_DEVICE)
> > -		return add_iommu_group(dev, NULL);
> > -	else if (action == BUS_NOTIFY_DEL_DEVICE)
> > -		return remove_iommu_group(dev);
> > +		return add_iommu_group(dev, ops);
> > +	else if (action == BUS_NOTIFY_DEL_DEVICE) {
> > +		if (ops->remove_device && dev->iommu_group) {
> > +			ops->remove_device(dev);
> > +			return 0;
> > +		}
> > +	}
> >  
> >  	return 0;
> >  }
> > @@ -79,7 +127,97 @@ static struct notifier_block iommu_device_nb = {
> >  static void iommu_bus_init(struct bus_type *bus, struct iommu_ops *ops)
> >  {
> >  	bus_register_notifier(bus, &iommu_device_nb);
> > -	bus_for_each_dev(bus, NULL, NULL, add_iommu_group);
> > +	bus_for_each_dev(bus, NULL, ops, add_iommu_group);
> > +}
> > +
> > +struct iommu_group *iommu_group_alloc(struct device *dma_dev)
> > +{
> > +	struct iommu_group *group;
> > +	int ret;
> > +
> > +	group = kzalloc(sizeof(*group), GFP_KERNEL);
> > +	if (!group)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	group->kobj.kset = iommu_group_kset;
> > +	group->dma_dev = dma_dev;
> > +	atomic_set(&group->refcnt, 0);
> > +
> > +	mutex_lock(&iommu_group_mutex);
> > +
> > +again:
> > +	if (unlikely(0 == idr_pre_get(&iommu_group_idr, GFP_KERNEL))) {
> > +		kfree(group);
> > +		mutex_unlock(&iommu_group_mutex);
> > +		return ERR_PTR(-ENOMEM);
> > +	}
> > +
> > +	if (-EAGAIN == idr_get_new(&iommu_group_idr, group, &group->id))
> > +		goto again;
> > +
> > +	mutex_unlock(&iommu_group_mutex);
> > +
> > +	ret = kobject_init_and_add(&group->kobj, &iommu_group_ktype,
> > +				   NULL, "%d", group->id);
> > +	if (ret) {
> > +		iommu_group_release(&group->kobj);
> > +		return ERR_PTR(ret);
> > +	}
> > +
> > +	group->devices_kobj = kobject_create_and_add("devices", &group->kobj);
> > +        if (!group->devices_kobj) {
> > +		kobject_put(&group->kobj);
> > +		return ERR_PTR(-ENOMEM);
> > +	}
> > +
> > +	dma_dev->iommu_group = group;
> > +
> > +	return group;
> > +}
> > +
> > +void iommu_group_free(struct iommu_group *group)
> > +{
> > +	group->dma_dev->iommu_group = NULL;
> > +	WARN_ON(atomic_read(&group->refcnt));
> 
> This kind of implies that the representative device doesn't count
> against the refcount, which seems wrong.

It is, fixed in upcoming revision.

> > +	kobject_put(group->devices_kobj);
> > +	kobject_put(&group->kobj);
> > +}
> > +
> > +bool iommu_group_empty(struct iommu_group *group)
> > +{
> > +	return (0 == atomic_read(&group->refcnt));
> > +}
> > +
> > +int iommu_group_add_device(struct iommu_group *group, struct device *dev)
> > +{
> > +	int ret;
> > +
> > +	atomic_inc(&group->refcnt);
> > +
> > +	ret = sysfs_create_link(&dev->kobj, &group->kobj, "iommu_group");
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = sysfs_create_link(group->devices_kobj, &dev->kobj,
> > +                                kobject_name(&dev->kobj));
> > +	if (ret) {
> > +		sysfs_remove_link(&dev->kobj, "iommu_group");
> > +		return ret;
> > +	}
> > +
> > +	dev->iommu_group = group;
> > +
> > +	return 0;
> > +}
> > +
> > +void iommu_group_remove_device(struct device *dev)
> > +{
> > +	sysfs_remove_link(dev->iommu_group->devices_kobj,
> > +			  kobject_name(&dev->kobj));
> > +	sysfs_remove_link(&dev->kobj, "iommu_group");
> > +
> > +	atomic_dec(&dev->iommu_group->refcnt);
> > +	dev->iommu_group = NULL;
> >  }
> >  
> >  /**
> > @@ -335,9 +473,23 @@ EXPORT_SYMBOL_GPL(iommu_unmap);
> >  
> >  int iommu_device_group(struct device *dev, unsigned int *groupid)
> >  {
> > -	if (iommu_present(dev->bus) && dev->bus->iommu_ops->device_group)
> > -		return dev->bus->iommu_ops->device_group(dev, groupid);
> > +	if (dev->iommu_group) {
> > +		*groupid = dev->iommu_group->id;
> > +		return 0;
> > +	}
> >  
> >  	return -ENODEV;
> >  }
> >  EXPORT_SYMBOL_GPL(iommu_device_group);
> > +
> > +static int __init iommu_init(void)
> > +{
> > +	iommu_group_kset = kset_create_and_add("iommu_groups", NULL, NULL);
> > +	idr_init(&iommu_group_idr);
> > +	mutex_init(&iommu_group_mutex);
> > +
> > +	BUG_ON(!iommu_group_kset);
> > +
> > +	return 0;
> > +}
> > +subsys_initcall(iommu_init);
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 2ee375c..24004d6 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -60,6 +60,8 @@ struct iommu_domain {
> >   * @iova_to_phys: translate iova to physical address
> >   * @domain_has_cap: domain capabilities query
> >   * @commit: commit iommu domain
> > + * @add_device: add device to iommu grouping
> > + * @remove_device: remove device from iommu grouping
> >   * @pgsize_bitmap: bitmap of supported page sizes
> >   */
> >  struct iommu_ops {
> > @@ -76,10 +78,25 @@ struct iommu_ops {
> >  	int (*domain_has_cap)(struct iommu_domain *domain,
> >  			      unsigned long cap);
> >  	int (*device_group)(struct device *dev, unsigned int *groupid);
> > +	int (*add_device)(struct device *dev);
> > +	void (*remove_device)(struct device *dev);
> >  	unsigned long pgsize_bitmap;
> >  };
> >  
> > +/**
> > + * struct iommu_group - groups of devices representing iommu visibility
> > + * @dma_dev: all dma from the group appears to the iommu using this source id
> > + * @kobj: iommu group node in sysfs
> > + * @devices_kobj: sysfs subdir node for linking devices
> > + * @refcnt: number of devices in group
> > + * @id: unique id number for the group (for easy sysfs listing)
> > + */
> >  struct iommu_group {
> > +	struct device *dma_dev;
> 
> So, a "representative" device works very nicely for the AMD and Intel
> IOMMUs.  But I'm not sure it makes sense for any IOMMU - we could
> point it to the bridge in the Power case, but I'm not sure if that
> really makes sense (particularly when it's a host bridge, not a p2p
> bridge).  In embedded cases it may make even less sense.

Yep, as noted in other reply, I've dropped this.

> I think it would be better to move this into iommu driver specific
> data (which I guess would mean adding a facility for iommu driver
> specific data, either with a private pointer or using container_of).

I was thinking a private pointer.  Embedding the iommu group into a
iommu structure probably wouldn't work well with the kobjects used to
represent the group.

> > +	struct kobject kobj;
> > +	struct kobject *devices_kobj;
> 
> The devices_kobj has the same lifetime as the group, so it might as
> well be a static member.

In my working version, devices_kobj is what holds the reference for the
group.  Documentation/kobject.txt also expressly forbids more than a
single embedded kobject.  Thanks,

Alex

> > +	atomic_t refcnt;
> > +	int id;
> >  };
> >  
> >  extern int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops);
> > @@ -101,6 +118,12 @@ extern int iommu_domain_has_cap(struct iommu_domain *domain,
> >  extern void iommu_set_fault_handler(struct iommu_domain *domain,
> >  					iommu_fault_handler_t handler);
> >  extern int iommu_device_group(struct device *dev, unsigned int *groupid);
> > +extern struct iommu_group *iommu_group_alloc(struct device *dev);
> > +extern void iommu_group_free(struct iommu_group *group);
> > +extern bool iommu_group_empty(struct iommu_group *group);
> > +extern int iommu_group_add_device(struct iommu_group *group,
> > +				  struct device *dev);
> > +extern void iommu_group_remove_device(struct device *dev);
> >  
> >  /**
> >   * report_iommu_fault() - report about an IOMMU fault to the IOMMU framework
> > 
> 




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 2/3] iommu: Create basic group infrastructure and update AMD-Vi & Intel VT-d
@ 2012-04-18 20:28       ` Alex Williamson
  0 siblings, 0 replies; 21+ messages in thread
From: Alex Williamson @ 2012-04-18 20:28 UTC (permalink / raw)
  To: David Gibson
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ

On Wed, 2012-04-18 at 21:55 +1000, David Gibson wrote:
> On Mon, Apr 02, 2012 at 03:14:46PM -0600, Alex Williamson wrote:
> > IOMMU groups define the minimum granularity of the IOMMU.  We therefore
> > create groups using a dma_dev which is the effective requestor ID for
> > the entire group.  Additional devices can be added to groups based on
> > system topology, IOMMU limitations, or device quirks.
> > 
> > This implementation also includes a simple idr database for creating
> > a flat address space of group IDs across multiple IOMMUs.  Updates
> > included for Intel VT-d, using example iommu callbacks for adding and
> > removing devices, as well as AMD-Vi, which tracks devices on it's own.
> > We should be able to better integrate the iommu_group within existing
> > AMD-Vi structs or provide a private data location within the iommu_group
> > where we can eventually reduce redundancy.
> > 
> > Signed-off-by: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> 
> Looks reasonable as far as it goes.  This still lacks an obvious means
> for doing group assignment of devices on busses subordinate to devices
> that are on iommu managed busses.  Which as we discussed earlier is a
> bit of a can of worms, but necessary.

Yes, this was a big can of worms in my isolation group attempt.  I'm
hoping it's easier here.  Perhaps the iommu core needs to register add
device notifiers for every bus and for each new device without a direct
iommu, walk up the device tree and add it as a hidden device in the
first iommu group it finds.  Nested iommus also throw a serious wrench
into the problem, but as we've discussed, are probably safe to ignore.

> > ---
> > 
> >  drivers/iommu/amd_iommu.c   |   50 ++++++-----
> >  drivers/iommu/intel-iommu.c |   76 +++++++++--------
> >  drivers/iommu/iommu.c       |  198 ++++++++++++++++++++++++++++++++++++++-----
> >  include/linux/iommu.h       |   23 +++++
> >  4 files changed, 267 insertions(+), 80 deletions(-)
> > 
> > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> > index f75e060..876db28 100644
> > --- a/drivers/iommu/amd_iommu.c
> > +++ b/drivers/iommu/amd_iommu.c
> > @@ -256,9 +256,10 @@ static bool check_device(struct device *dev)
> >  
> >  static int iommu_init_device(struct device *dev)
> >  {
> > -	struct pci_dev *pdev = to_pci_dev(dev);
> > +	struct pci_dev *dma_pdev, *pdev = to_pci_dev(dev);
> >  	struct iommu_dev_data *dev_data;
> >  	u16 alias;
> > +	int ret;
> >  
> >  	if (dev->archdata.iommu)
> >  		return 0;
> > @@ -279,6 +280,26 @@ static int iommu_init_device(struct device *dev)
> >  			return -ENOTSUPP;
> >  		}
> >  		dev_data->alias_data = alias_data;
> > +
> > +		dma_pdev = pci_get_bus_and_slot(alias >> 8, alias & 0xff);
> > +	} else
> > +		dma_pdev = pdev;
> > +
> > +	/* dma_pdev = iommu_pci_quirk(dma_pdev) */
> 
> Presumably an actual implementation of the quirk is coming?  It might

Yeah, I can add a simple one.  It may also be a good place to implement
any kind of global policy, like grouping multifunction devices or
enforcing PCI ACS.  Once we have that, I hope we could just add some
kind of get/set iommudata to the group that might allow dma_ops to start
making use of groups.

> be an idea for it to take both the individual and representative
> devices, in case that information is useful.

Ok.

> > +	if (!dma_pdev->dev.iommu_group) {
> > +		struct iommu_group *group;
> > +
> > +		group = iommu_group_alloc(&dma_pdev->dev);
> > +		if (IS_ERR(group))
> > +			return PTR_ERR(group);
> > +	}
> > +
> > +	ret = iommu_group_add_device(dma_pdev->dev.iommu_group, dev);
> > +	if (ret) {
> > +		if (iommu_group_empty(dma_pdev->dev.iommu_group))
> > +			iommu_group_free(dma_pdev->dev.iommu_group);
> > +		return ret;
> 
> It might be nice to have generic helpers that do this kind of lifetime
> handling of the groups, but that's a detail.

I've cleaned this up a lot in my working version, using kobject
references makes this much better.

> >  	}
> >  
> >  	if (pci_iommuv2_capable(pdev)) {
> > @@ -309,6 +330,12 @@ static void iommu_ignore_device(struct device *dev)
> >  
> >  static void iommu_uninit_device(struct device *dev)
> >  {
> > +	struct iommu_group *group = dev->iommu_group;
> > +
> > +	iommu_group_remove_device(dev);
> > +	if (iommu_group_empty(group))
> > +		iommu_group_free(group);
> > +
> >  	/*
> >  	 * Nothing to do here - we keep dev_data around for unplugged devices
> >  	 * and reuse it when the device is re-plugged - not doing so would
> > @@ -3191,26 +3218,6 @@ static int amd_iommu_domain_has_cap(struct iommu_domain *domain,
> >  	return 0;
> >  }
> >  
> > -static int amd_iommu_device_group(struct device *dev, unsigned int *groupid)
> > -{
> > -	struct iommu_dev_data *dev_data = dev->archdata.iommu;
> > -	struct pci_dev *pdev = to_pci_dev(dev);
> > -	u16 devid;
> > -
> > -	if (!dev_data)
> > -		return -ENODEV;
> > -
> > -	if (pdev->is_virtfn || !iommu_group_mf)
> > -		devid = dev_data->devid;
> > -	else
> > -		devid = calc_devid(pdev->bus->number,
> > -				   PCI_DEVFN(PCI_SLOT(pdev->devfn), 0));
> > -
> > -	*groupid = amd_iommu_alias_table[devid];
> > -
> > -	return 0;
> > -}
> > -
> >  static struct iommu_ops amd_iommu_ops = {
> >  	.domain_init = amd_iommu_domain_init,
> >  	.domain_destroy = amd_iommu_domain_destroy,
> > @@ -3220,7 +3227,6 @@ static struct iommu_ops amd_iommu_ops = {
> >  	.unmap = amd_iommu_unmap,
> >  	.iova_to_phys = amd_iommu_iova_to_phys,
> >  	.domain_has_cap = amd_iommu_domain_has_cap,
> > -	.device_group = amd_iommu_device_group,
> >  	.pgsize_bitmap	= AMD_IOMMU_PGSIZES,
> >  };
> >  
> > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> > index c9c6053..41ab7d0 100644
> > --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -4075,54 +4075,59 @@ static int intel_iommu_domain_has_cap(struct iommu_domain *domain,
> >  	return 0;
> >  }
> >  
> > -/*
> > - * Group numbers are arbitrary.  Device with the same group number
> > - * indicate the iommu cannot differentiate between them.  To avoid
> > - * tracking used groups we just use the seg|bus|devfn of the lowest
> > - * level we're able to differentiate devices
> > - */
> > -static int intel_iommu_device_group(struct device *dev, unsigned int *groupid)
> > +static int intel_iommu_add_device(struct device *dev)
> >  {
> >  	struct pci_dev *pdev = to_pci_dev(dev);
> > -	struct pci_dev *bridge;
> > -	union {
> > -		struct {
> > -			u8 devfn;
> > -			u8 bus;
> > -			u16 segment;
> > -		} pci;
> > -		u32 group;
> > -	} id;
> > -
> > -	if (iommu_no_mapping(dev))
> > -		return -ENODEV;
> > -
> > -	id.pci.segment = pci_domain_nr(pdev->bus);
> > -	id.pci.bus = pdev->bus->number;
> > -	id.pci.devfn = pdev->devfn;
> > +	struct pci_dev *bridge, *dma_pdev = pdev;
> > +	int ret;
> >  
> > -	if (!device_to_iommu(id.pci.segment, id.pci.bus, id.pci.devfn))
> > +	if (!device_to_iommu(pci_domain_nr(pdev->bus),
> > +			     pdev->bus->number, pdev->devfn))
> >  		return -ENODEV;
> >  
> >  	bridge = pci_find_upstream_pcie_bridge(pdev);
> >  	if (bridge) {
> > -		if (pci_is_pcie(bridge)) {
> > -			id.pci.bus = bridge->subordinate->number;
> > -			id.pci.devfn = 0;
> > -		} else {
> > -			id.pci.bus = bridge->bus->number;
> > -			id.pci.devfn = bridge->devfn;
> > -		}
> > +		if (pci_is_pcie(bridge))
> > +			dma_pdev = pci_get_domain_bus_and_slot(
> > +						pci_domain_nr(pdev->bus),
> > +						bridge->subordinate->number, 0);
> > +		else
> > +			dma_pdev = bridge;
> > +	}
> > +
> > +	if (!bridge && !pdev->is_virtfn && iommu_group_mf)
> > +		dma_pdev = pci_get_slot(dma_pdev->bus, 0);
> > +
> > +	/* dma_pdev = iommu_pci_quirk(dma_pdev); */
> > +
> > +	if (!dma_pdev->dev.iommu_group) {
> > +		struct iommu_group *group;
> > +
> > +		group = iommu_group_alloc(&dma_pdev->dev);
> > +		if (IS_ERR(group))
> > +			return PTR_ERR(group);
> >  	}
> >  
> > -	if (!pdev->is_virtfn && iommu_group_mf)
> > -		id.pci.devfn = PCI_DEVFN(PCI_SLOT(id.pci.devfn), 0);
> > +	ret = iommu_group_add_device(dma_pdev->dev.iommu_group, dev);
> > +	if (ret) {
> > +		if (iommu_group_empty(dma_pdev->dev.iommu_group))
> > +			iommu_group_free(dma_pdev->dev.iommu_group);
> >  
> > -	*groupid = id.group;
> > +		return ret;
> > +	}
> >  
> >  	return 0;
> >  }
> >  
> > +static void intel_iommu_remove_device(struct device *dev)
> > +{
> > +	struct iommu_group *group = dev->iommu_group;
> > +
> > +	iommu_group_remove_device(dev);
> > +	if (iommu_group_empty(group))
> > +		iommu_group_free(group);
> > +}
> > +
> >  static struct iommu_ops intel_iommu_ops = {
> >  	.domain_init	= intel_iommu_domain_init,
> >  	.domain_destroy = intel_iommu_domain_destroy,
> > @@ -4132,7 +4137,8 @@ static struct iommu_ops intel_iommu_ops = {
> >  	.unmap		= intel_iommu_unmap,
> >  	.iova_to_phys	= intel_iommu_iova_to_phys,
> >  	.domain_has_cap = intel_iommu_domain_has_cap,
> > -	.device_group	= intel_iommu_device_group,
> > +	.add_device	= intel_iommu_add_device,
> > +	.remove_device	= intel_iommu_remove_device,
> >  	.pgsize_bitmap	= INTEL_IOMMU_PGSIZES,
> >  };
> >  
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index 2198b2d..6a51ed9 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -26,48 +26,96 @@
> >  #include <linux/slab.h>
> >  #include <linux/errno.h>
> >  #include <linux/iommu.h>
> > +#include <linux/idr.h>
> >  
> > -static ssize_t show_iommu_group(struct device *dev,
> > -				struct device_attribute *attr, char *buf)
> > +static struct kset *iommu_group_kset;
> > +static struct idr iommu_group_idr;
> > +static struct mutex iommu_group_mutex;
> > +
> > +struct iommu_group_attribute {
> > +	struct attribute attr;
> > +	ssize_t (*show)(struct iommu_group *group, char *buf);
> > +	ssize_t (*store)(struct iommu_group *group,
> > +			 const char *buf, size_t count);
> > +};
> > +
> > +#define to_iommu_group_attr(_attr)	\
> > +	container_of(_attr, struct iommu_group_attribute, attr)
> > +#define to_iommu_group(_kobj)		\
> > +	container_of(_kobj, struct iommu_group, kobj)
> > +
> > +static ssize_t iommu_group_attr_show(struct kobject *kobj,
> > +				     struct attribute *__attr, char *buf)
> >  {
> > -	unsigned int groupid;
> > +	struct iommu_group_attribute *attr = to_iommu_group_attr(__attr);
> > +	struct iommu_group *group = to_iommu_group(kobj);
> > +	ssize_t ret = -EIO;
> >  
> > -	if (iommu_device_group(dev, &groupid))
> > -		return 0;
> > +	if (attr->show)
> > +		ret = attr->show(group, buf);
> > +	return ret;
> > +}
> > +
> > +static ssize_t iommu_group_attr_store(struct kobject *kobj,
> > +				      struct attribute *__attr,
> > +				      const char *buf, size_t count)
> > +{
> > +	struct iommu_group_attribute *attr = to_iommu_group_attr(__attr);
> > +	struct iommu_group *group = to_iommu_group(kobj);
> > +	ssize_t ret = -EIO;
> >  
> > -	return sprintf(buf, "%u", groupid);
> > +	if (attr->store)
> > +		ret = attr->store(group, buf, count);
> > +	return ret;
> >  }
> > -static DEVICE_ATTR(iommu_group, S_IRUGO, show_iommu_group, NULL);
> >  
> > -static int add_iommu_group(struct device *dev, void *data)
> > +static void iommu_group_release(struct kobject *kobj)
> >  {
> > -	unsigned int groupid;
> > +	struct iommu_group *group = to_iommu_group(kobj);
> >  
> > -	if (iommu_device_group(dev, &groupid) == 0)
> > -		return device_create_file(dev, &dev_attr_iommu_group);
> > +	mutex_lock(&iommu_group_mutex);
> > +	idr_remove(&iommu_group_idr, group->id);
> > +	mutex_unlock(&iommu_group_mutex);
> >  
> > -	return 0;
> > +	kfree(group);
> >  }
> >  
> > -static int remove_iommu_group(struct device *dev)
> > +static const struct sysfs_ops iommu_group_sysfs_ops = {
> > +	.show = iommu_group_attr_show,
> > +	.store = iommu_group_attr_store,
> > +};
> > +
> > +static struct kobj_type iommu_group_ktype = {
> > +	.sysfs_ops = &iommu_group_sysfs_ops,
> > +	.release = iommu_group_release,
> > +};
> > +
> > +static int add_iommu_group(struct device *dev, void *data)
> >  {
> > -	unsigned int groupid;
> > +	struct iommu_ops *ops = data;
> >  
> > -	if (iommu_device_group(dev, &groupid) == 0)
> > -		device_remove_file(dev, &dev_attr_iommu_group);
> > +	if (!ops->add_device)
> > +		return -ENODEV;
> >  
> > -	return 0;
> > +	WARN_ON(dev->iommu_group);
> > +
> > +	return ops->add_device(dev);
> >  }
> >  
> >  static int iommu_device_notifier(struct notifier_block *nb,
> >  				 unsigned long action, void *data)
> >  {
> >  	struct device *dev = data;
> > +	struct iommu_ops *ops = dev->bus->iommu_ops;
> >  
> >  	if (action == BUS_NOTIFY_ADD_DEVICE)
> > -		return add_iommu_group(dev, NULL);
> > -	else if (action == BUS_NOTIFY_DEL_DEVICE)
> > -		return remove_iommu_group(dev);
> > +		return add_iommu_group(dev, ops);
> > +	else if (action == BUS_NOTIFY_DEL_DEVICE) {
> > +		if (ops->remove_device && dev->iommu_group) {
> > +			ops->remove_device(dev);
> > +			return 0;
> > +		}
> > +	}
> >  
> >  	return 0;
> >  }
> > @@ -79,7 +127,97 @@ static struct notifier_block iommu_device_nb = {
> >  static void iommu_bus_init(struct bus_type *bus, struct iommu_ops *ops)
> >  {
> >  	bus_register_notifier(bus, &iommu_device_nb);
> > -	bus_for_each_dev(bus, NULL, NULL, add_iommu_group);
> > +	bus_for_each_dev(bus, NULL, ops, add_iommu_group);
> > +}
> > +
> > +struct iommu_group *iommu_group_alloc(struct device *dma_dev)
> > +{
> > +	struct iommu_group *group;
> > +	int ret;
> > +
> > +	group = kzalloc(sizeof(*group), GFP_KERNEL);
> > +	if (!group)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	group->kobj.kset = iommu_group_kset;
> > +	group->dma_dev = dma_dev;
> > +	atomic_set(&group->refcnt, 0);
> > +
> > +	mutex_lock(&iommu_group_mutex);
> > +
> > +again:
> > +	if (unlikely(0 == idr_pre_get(&iommu_group_idr, GFP_KERNEL))) {
> > +		kfree(group);
> > +		mutex_unlock(&iommu_group_mutex);
> > +		return ERR_PTR(-ENOMEM);
> > +	}
> > +
> > +	if (-EAGAIN == idr_get_new(&iommu_group_idr, group, &group->id))
> > +		goto again;
> > +
> > +	mutex_unlock(&iommu_group_mutex);
> > +
> > +	ret = kobject_init_and_add(&group->kobj, &iommu_group_ktype,
> > +				   NULL, "%d", group->id);
> > +	if (ret) {
> > +		iommu_group_release(&group->kobj);
> > +		return ERR_PTR(ret);
> > +	}
> > +
> > +	group->devices_kobj = kobject_create_and_add("devices", &group->kobj);
> > +        if (!group->devices_kobj) {
> > +		kobject_put(&group->kobj);
> > +		return ERR_PTR(-ENOMEM);
> > +	}
> > +
> > +	dma_dev->iommu_group = group;
> > +
> > +	return group;
> > +}
> > +
> > +void iommu_group_free(struct iommu_group *group)
> > +{
> > +	group->dma_dev->iommu_group = NULL;
> > +	WARN_ON(atomic_read(&group->refcnt));
> 
> This kind of implies that the representative device doesn't count
> against the refcount, which seems wrong.

It is, fixed in upcoming revision.

> > +	kobject_put(group->devices_kobj);
> > +	kobject_put(&group->kobj);
> > +}
> > +
> > +bool iommu_group_empty(struct iommu_group *group)
> > +{
> > +	return (0 == atomic_read(&group->refcnt));
> > +}
> > +
> > +int iommu_group_add_device(struct iommu_group *group, struct device *dev)
> > +{
> > +	int ret;
> > +
> > +	atomic_inc(&group->refcnt);
> > +
> > +	ret = sysfs_create_link(&dev->kobj, &group->kobj, "iommu_group");
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = sysfs_create_link(group->devices_kobj, &dev->kobj,
> > +                                kobject_name(&dev->kobj));
> > +	if (ret) {
> > +		sysfs_remove_link(&dev->kobj, "iommu_group");
> > +		return ret;
> > +	}
> > +
> > +	dev->iommu_group = group;
> > +
> > +	return 0;
> > +}
> > +
> > +void iommu_group_remove_device(struct device *dev)
> > +{
> > +	sysfs_remove_link(dev->iommu_group->devices_kobj,
> > +			  kobject_name(&dev->kobj));
> > +	sysfs_remove_link(&dev->kobj, "iommu_group");
> > +
> > +	atomic_dec(&dev->iommu_group->refcnt);
> > +	dev->iommu_group = NULL;
> >  }
> >  
> >  /**
> > @@ -335,9 +473,23 @@ EXPORT_SYMBOL_GPL(iommu_unmap);
> >  
> >  int iommu_device_group(struct device *dev, unsigned int *groupid)
> >  {
> > -	if (iommu_present(dev->bus) && dev->bus->iommu_ops->device_group)
> > -		return dev->bus->iommu_ops->device_group(dev, groupid);
> > +	if (dev->iommu_group) {
> > +		*groupid = dev->iommu_group->id;
> > +		return 0;
> > +	}
> >  
> >  	return -ENODEV;
> >  }
> >  EXPORT_SYMBOL_GPL(iommu_device_group);
> > +
> > +static int __init iommu_init(void)
> > +{
> > +	iommu_group_kset = kset_create_and_add("iommu_groups", NULL, NULL);
> > +	idr_init(&iommu_group_idr);
> > +	mutex_init(&iommu_group_mutex);
> > +
> > +	BUG_ON(!iommu_group_kset);
> > +
> > +	return 0;
> > +}
> > +subsys_initcall(iommu_init);
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 2ee375c..24004d6 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -60,6 +60,8 @@ struct iommu_domain {
> >   * @iova_to_phys: translate iova to physical address
> >   * @domain_has_cap: domain capabilities query
> >   * @commit: commit iommu domain
> > + * @add_device: add device to iommu grouping
> > + * @remove_device: remove device from iommu grouping
> >   * @pgsize_bitmap: bitmap of supported page sizes
> >   */
> >  struct iommu_ops {
> > @@ -76,10 +78,25 @@ struct iommu_ops {
> >  	int (*domain_has_cap)(struct iommu_domain *domain,
> >  			      unsigned long cap);
> >  	int (*device_group)(struct device *dev, unsigned int *groupid);
> > +	int (*add_device)(struct device *dev);
> > +	void (*remove_device)(struct device *dev);
> >  	unsigned long pgsize_bitmap;
> >  };
> >  
> > +/**
> > + * struct iommu_group - groups of devices representing iommu visibility
> > + * @dma_dev: all dma from the group appears to the iommu using this source id
> > + * @kobj: iommu group node in sysfs
> > + * @devices_kobj: sysfs subdir node for linking devices
> > + * @refcnt: number of devices in group
> > + * @id: unique id number for the group (for easy sysfs listing)
> > + */
> >  struct iommu_group {
> > +	struct device *dma_dev;
> 
> So, a "representative" device works very nicely for the AMD and Intel
> IOMMUs.  But I'm not sure it makes sense for any IOMMU - we could
> point it to the bridge in the Power case, but I'm not sure if that
> really makes sense (particularly when it's a host bridge, not a p2p
> bridge).  In embedded cases it may make even less sense.

Yep, as noted in other reply, I've dropped this.

> I think it would be better to move this into iommu driver specific
> data (which I guess would mean adding a facility for iommu driver
> specific data, either with a private pointer or using container_of).

I was thinking a private pointer.  Embedding the iommu group into a
iommu structure probably wouldn't work well with the kobjects used to
represent the group.

> > +	struct kobject kobj;
> > +	struct kobject *devices_kobj;
> 
> The devices_kobj has the same lifetime as the group, so it might as
> well be a static member.

In my working version, devices_kobj is what holds the reference for the
group.  Documentation/kobject.txt also expressly forbids more than a
single embedded kobject.  Thanks,

Alex

> > +	atomic_t refcnt;
> > +	int id;
> >  };
> >  
> >  extern int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops);
> > @@ -101,6 +118,12 @@ extern int iommu_domain_has_cap(struct iommu_domain *domain,
> >  extern void iommu_set_fault_handler(struct iommu_domain *domain,
> >  					iommu_fault_handler_t handler);
> >  extern int iommu_device_group(struct device *dev, unsigned int *groupid);
> > +extern struct iommu_group *iommu_group_alloc(struct device *dev);
> > +extern void iommu_group_free(struct iommu_group *group);
> > +extern bool iommu_group_empty(struct iommu_group *group);
> > +extern int iommu_group_add_device(struct iommu_group *group,
> > +				  struct device *dev);
> > +extern void iommu_group_remove_device(struct device *dev);
> >  
> >  /**
> >   * report_iommu_fault() - report about an IOMMU fault to the IOMMU framework
> > 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2012-04-18 20:28 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-02 21:14 [RFC PATCH 0/3] IOMMU groups Alex Williamson
2012-04-02 21:14 ` [Qemu-devel] " Alex Williamson
2012-04-02 21:14 ` Alex Williamson
2012-04-02 21:14 ` [RFC PATCH 1/3] iommu: Introduce iommu_group Alex Williamson
2012-04-02 21:14   ` [Qemu-devel] " Alex Williamson
2012-04-02 21:14   ` Alex Williamson
2012-04-18  9:58   ` David Gibson
2012-04-18  9:58     ` David Gibson
2012-04-18 20:07     ` Alex Williamson
2012-04-18 20:07       ` Alex Williamson
2012-04-02 21:14 ` [RFC PATCH 2/3] iommu: Create basic group infrastructure and update AMD-Vi & Intel VT-d Alex Williamson
2012-04-02 21:14   ` [Qemu-devel] " Alex Williamson
2012-04-02 21:14   ` Alex Williamson
2012-04-18 11:55   ` David Gibson
2012-04-18 20:28     ` Alex Williamson
2012-04-18 20:28       ` Alex Williamson
2012-04-02 21:14 ` [RFC PATCH 3/3] iommu: Create attach/detach group interface Alex Williamson
2012-04-02 21:14   ` [Qemu-devel] " Alex Williamson
2012-04-02 21:14   ` Alex Williamson
2012-04-10 21:03 ` [RFC PATCH 0/3] IOMMU groups Alex Williamson
2012-04-10 21:03   ` Alex Williamson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.