[PATCH v3 0/6] iommu: split mapping to page sizes as supported by the hardware

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/6] iommu: split mapping to page sizes as supported by the hardware
@ 2011-09-16 17:51 ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-09-16 17:51 UTC (permalink / raw)
  To: iommu
  Cc: linux-omap, Hiroshi DOYU, Laurent Pinchart, Joerg Roedel,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Ohad Ben-Cohen

v2->v3:

- s/KB/KiB/ (David W)

v1->v2:

- split to patches (by keeping the old code around until all drivers are converted) (Joerg)


Ohad Ben-Cohen (6):
  iommu/core: split mapping to page sizes as supported by the hardware
  iommu/omap: announce supported page sizes
  iommu/msm: announce supported page sizes
  iommu/amd: announce supported page sizes
  iommu/intel: announce supported page sizes
  iommu/core: remove the temporary register_iommu_pgsize API

 drivers/iommu/amd_iommu.c   |   20 ++++++-
 drivers/iommu/intel-iommu.c |   20 ++++++-
 drivers/iommu/iommu.c       |  130 +++++++++++++++++++++++++++++++++++++++----
 drivers/iommu/msm_iommu.c   |    8 ++-
 drivers/iommu/omap-iommu.c  |    6 ++-
 drivers/iommu/omap-iovmm.c  |   12 +---
 include/linux/iommu.h       |    7 +-
 virt/kvm/iommu.c            |    4 +-
 8 files changed, 177 insertions(+), 30 deletions(-)

-- 
1.7.4.1


^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v3 0/6] iommu: split mapping to page sizes as supported by the hardware
@ 2011-09-16 17:51 ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-09-16 17:51 UTC (permalink / raw)
  To: iommu
  Cc: linux-omap, Hiroshi DOYU, Laurent Pinchart, Joerg Roedel,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Ohad Ben-Cohen

v2->v3:

- s/KB/KiB/ (David W)

v1->v2:

- split to patches (by keeping the old code around until all drivers are converted) (Joerg)


Ohad Ben-Cohen (6):
  iommu/core: split mapping to page sizes as supported by the hardware
  iommu/omap: announce supported page sizes
  iommu/msm: announce supported page sizes
  iommu/amd: announce supported page sizes
  iommu/intel: announce supported page sizes
  iommu/core: remove the temporary register_iommu_pgsize API

 drivers/iommu/amd_iommu.c   |   20 ++++++-
 drivers/iommu/intel-iommu.c |   20 ++++++-
 drivers/iommu/iommu.c       |  130 +++++++++++++++++++++++++++++++++++++++----
 drivers/iommu/msm_iommu.c   |    8 ++-
 drivers/iommu/omap-iommu.c  |    6 ++-
 drivers/iommu/omap-iovmm.c  |   12 +---
 include/linux/iommu.h       |    7 +-
 virt/kvm/iommu.c            |    4 +-
 8 files changed, 177 insertions(+), 30 deletions(-)

-- 
1.7.4.1

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v3 0/6] iommu: split mapping to page sizes as supported by the hardware
@ 2011-09-16 17:51 ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-09-16 17:51 UTC (permalink / raw)
  To: linux-arm-kernel

v2->v3:

- s/KB/KiB/ (David W)

v1->v2:

- split to patches (by keeping the old code around until all drivers are converted) (Joerg)


Ohad Ben-Cohen (6):
  iommu/core: split mapping to page sizes as supported by the hardware
  iommu/omap: announce supported page sizes
  iommu/msm: announce supported page sizes
  iommu/amd: announce supported page sizes
  iommu/intel: announce supported page sizes
  iommu/core: remove the temporary register_iommu_pgsize API

 drivers/iommu/amd_iommu.c   |   20 ++++++-
 drivers/iommu/intel-iommu.c |   20 ++++++-
 drivers/iommu/iommu.c       |  130 +++++++++++++++++++++++++++++++++++++++----
 drivers/iommu/msm_iommu.c   |    8 ++-
 drivers/iommu/omap-iommu.c  |    6 ++-
 drivers/iommu/omap-iovmm.c  |   12 +---
 include/linux/iommu.h       |    7 +-
 virt/kvm/iommu.c            |    4 +-
 8 files changed, 177 insertions(+), 30 deletions(-)

-- 
1.7.4.1

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
  2011-09-16 17:51 ` Ohad Ben-Cohen
  (?)
@ 2011-09-16 17:51   ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-09-16 17:51 UTC (permalink / raw)
  To: iommu
  Cc: linux-omap, Hiroshi DOYU, Laurent Pinchart, Joerg Roedel,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Ohad Ben-Cohen, Stepan Moskovchenko, kvm

When mapping a memory region, split it to page sizes as supported
by the iommu hardware. Always prefer bigger pages, when possible,
in order to reduce the TLB pressure.

The logic to do that is now added to the IOMMU core, so neither the iommu
drivers themselves nor users of the IOMMU API have to duplicate it.

This allows a more lenient granularity of mappings; traditionally the
IOMMU API took 'order' (of a page) as a mapping size, and directly let
the low level iommu drivers handle the mapping, but now that the IOMMU
core can split arbitrary memory regions into pages, we can remove this
limitation, so users don't have to split those regions by themselves.

Currently the supported page sizes are advertised once and they then
remain static. That works well for OMAP (and seemingly MSM too) but
it would probably not fly with intel's hardware, where the page size
capabilities seem to have the potential to be different between
several DMA remapping devices. This limitation can be dealt with
later, if desired. For now, the existing IOMMU API behavior is retained
(see: "iommu/intel: announce supported page sizes").

As requested, register_iommu() isn't changed yet, so we can convert
the IOMMU drivers in subsequent patches, and after all the drivers
are converted, register_iommu will be changed (and the temporary
register_iommu_pgsize() will be removed).

Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted
to send the mapping size in bytes instead of in page order.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: David Brown <davidb@codeaurora.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <Joerg.Roedel@amd.com>
Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
Cc: Hiroshi DOYU <Hiroshi.DOYU@nokia.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: kvm@vger.kernel.org
---
 drivers/iommu/iommu.c      |  158 +++++++++++++++++++++++++++++++++++++++++---
 drivers/iommu/omap-iovmm.c |   12 +---
 include/linux/iommu.h      |    6 +-
 virt/kvm/iommu.c           |    4 +-
 4 files changed, 157 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index c68ff29..7c01c8c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -16,6 +16,8 @@
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
  */
 
+#define pr_fmt(fmt)    "%s: " fmt, __func__
+
 #include <linux/kernel.h>
 #include <linux/bug.h>
 #include <linux/types.h>
@@ -23,15 +25,73 @@
 #include <linux/slab.h>
 #include <linux/errno.h>
 #include <linux/iommu.h>
+#include <linux/bitmap.h>
 
 static struct iommu_ops *iommu_ops;
 
+/* bitmap of supported page sizes */
+static unsigned long *iommu_pgsize_bitmap;
+
+/* number of bits used to represent the supported pages */
+static unsigned int iommu_nr_page_bits;
+
+/* size of the smallest supported page (in bytes) */
+static unsigned int iommu_min_pagesz;
+
+/* bit number of the smallest supported page */
+static unsigned int iommu_min_page_idx;
+
+/**
+ * register_iommu() - register an IOMMU hardware
+ * @ops: iommu handlers
+ * @pgsize_bitmap: bitmap of page sizes supported by the hardware
+ * @nr_page_bits: size of @pgsize_bitmap (in bits)
+ *
+ * Note: this is a temporary function, which will be removed once
+ * all IOMMU drivers are converted. The only reason it exists is to
+ * allow splitting the pgsizes changes to several patches in order to ease
+ * the review.
+ */
+void register_iommu_pgsize(struct iommu_ops *ops, unsigned long *pgsize_bitmap,
+					unsigned int nr_page_bits)
+{
+	if (iommu_ops || iommu_pgsize_bitmap || !nr_page_bits)
+		BUG();
+
+	iommu_ops = ops;
+	iommu_pgsize_bitmap = pgsize_bitmap;
+	iommu_nr_page_bits = nr_page_bits;
+
+	/* find the minimum page size and its index only once */
+	iommu_min_page_idx = find_first_bit(pgsize_bitmap, nr_page_bits);
+	iommu_min_pagesz = 1 << iommu_min_page_idx;
+}
+
+/*
+ * default pagesize bitmap, will be removed once all IOMMU drivers
+ * are converted
+ */
+static unsigned long default_iommu_pgsizes = ~0xFFFUL;
+
 void register_iommu(struct iommu_ops *ops)
 {
 	if (iommu_ops)
 		BUG();
 
 	iommu_ops = ops;
+
+	/*
+	 * set default pgsize values, which retain the existing
+	 * IOMMU API behavior: drivers will be called to map
+	 * regions that are sized/aligned to order of 4KiB pages
+	 */
+	iommu_pgsize_bitmap = &default_iommu_pgsizes;
+	iommu_nr_page_bits = BITS_PER_LONG;
+
+	/* find the minimum page size and its index only once */
+	iommu_min_page_idx = find_first_bit(iommu_pgsize_bitmap,
+						iommu_nr_page_bits);
+	iommu_min_pagesz = 1 << iommu_min_page_idx;
 }
 
 bool iommu_found(void)
@@ -109,26 +169,104 @@ int iommu_domain_has_cap(struct iommu_domain *domain,
 EXPORT_SYMBOL_GPL(iommu_domain_has_cap);
 
 int iommu_map(struct iommu_domain *domain, unsigned long iova,
-	      phys_addr_t paddr, int gfp_order, int prot)
+	      phys_addr_t paddr, size_t size, int prot)
 {
-	size_t size;
+	int ret = 0;
+
+	/*
+	 * both the virtual address and the physical one, as well as
+	 * the size of the mapping, must be aligned (at least) to the
+	 * size of the smallest page supported by the hardware
+	 */
+	if (!IS_ALIGNED(iova | paddr | size, iommu_min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%lx min_pagesz "
+			"0x%x\n", iova, (unsigned long)paddr,
+			(unsigned long)size, iommu_min_pagesz);
+		return -EINVAL;
+	}
+
+	pr_debug("map: iova 0x%lx pa 0x%lx size 0x%lx\n", iova,
+				(unsigned long)paddr, (unsigned long)size);
+
+	while (size) {
+		unsigned long pgsize = iommu_min_pagesz;
+		unsigned long idx = iommu_min_page_idx;
+		unsigned long addr_merge = iova | paddr;
+		int order;
+
+		/* find the max page size with which iova, paddr are aligned */
+		for (;;) {
+			unsigned long try_pgsize;
+
+			idx = find_next_bit(iommu_pgsize_bitmap,
+						iommu_nr_page_bits, idx + 1);
+
+			/* no more pages to check ? */
+			if (idx >= iommu_nr_page_bits)
+				break;
+
+			try_pgsize = 1 << idx;
 
-	size         = 0x1000UL << gfp_order;
+			/* page too big ? addresses not aligned ? */
+			if (size < try_pgsize ||
+					!IS_ALIGNED(addr_merge, try_pgsize))
+				break;
 
-	BUG_ON(!IS_ALIGNED(iova | paddr, size));
+			pgsize = try_pgsize;
+		}
 
-	return iommu_ops->map(domain, iova, paddr, gfp_order, prot);
+		order = get_order(pgsize);
+
+		pr_debug("mapping: iova 0x%lx pa 0x%lx order %d\n", iova,
+					(unsigned long)paddr, order);
+
+		ret = iommu_ops->map(domain, iova, paddr, order, prot);
+		if (ret)
+			break;
+
+		size -= pgsize;
+		iova += pgsize;
+		paddr += pgsize;
+	}
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_map);
 
-int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
+int iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
 {
-	size_t size;
+	int order, unmapped_size, unmapped_order, total_unmapped = 0;
+
+	/*
+	 * The virtual address, as well as the size of the mapping, must be
+	 * aligned (at least) to the size of the smallest page supported
+	 * by the hardware
+	 */
+	if (!IS_ALIGNED(iova | size, iommu_min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx size 0x%lx min_pagesz 0x%x\n",
+				iova, (unsigned long)size, iommu_min_pagesz);
+		return -EINVAL;
+	}
+
+	pr_debug("unmap this: iova 0x%lx size 0x%lx\n", iova,
+							(unsigned long)size);
+
+	while (size > total_unmapped) {
+		order = get_order(size - total_unmapped);
+
+		unmapped_order = iommu_ops->unmap(domain, iova, order);
+		if (unmapped_order < 0)
+			return unmapped_order;
+
+		pr_debug("unmapped: iova 0x%lx order %d\n", iova,
+							unmapped_order);
 
-	size         = 0x1000UL << gfp_order;
+		unmapped_size = 0x1000UL << unmapped_order;
 
-	BUG_ON(!IS_ALIGNED(iova, size));
+		iova += unmapped_size;
+		total_unmapped += unmapped_size;
+	}
 
-	return iommu_ops->unmap(domain, iova, gfp_order);
+	return get_order(total_unmapped);
 }
 EXPORT_SYMBOL_GPL(iommu_unmap);
diff --git a/drivers/iommu/omap-iovmm.c b/drivers/iommu/omap-iovmm.c
index e8fdb88..f4dea5a 100644
--- a/drivers/iommu/omap-iovmm.c
+++ b/drivers/iommu/omap-iovmm.c
@@ -409,7 +409,6 @@ static int map_iovm_area(struct iommu_domain *domain, struct iovm_struct *new,
 	unsigned int i, j;
 	struct scatterlist *sg;
 	u32 da = new->da_start;
-	int order;
 
 	if (!domain || !sgt)
 		return -EINVAL;
@@ -428,12 +427,10 @@ static int map_iovm_area(struct iommu_domain *domain, struct iovm_struct *new,
 		if (bytes_to_iopgsz(bytes) < 0)
 			goto err_out;
 
-		order = get_order(bytes);
-
 		pr_debug("%s: [%d] %08x %08x(%x)\n", __func__,
 			 i, da, pa, bytes);
 
-		err = iommu_map(domain, da, pa, order, flags);
+		err = iommu_map(domain, da, pa, bytes, flags);
 		if (err)
 			goto err_out;
 
@@ -448,10 +445,9 @@ err_out:
 		size_t bytes;
 
 		bytes = sg->length + sg->offset;
-		order = get_order(bytes);
 
 		/* ignore failures.. we're already handling one */
-		iommu_unmap(domain, da, order);
+		iommu_unmap(domain, da, bytes);
 
 		da += bytes;
 	}
@@ -474,12 +470,10 @@ static void unmap_iovm_area(struct iommu_domain *domain, struct omap_iommu *obj,
 	start = area->da_start;
 	for_each_sg(sgt->sgl, sg, sgt->nents, i) {
 		size_t bytes;
-		int order;
 
 		bytes = sg->length + sg->offset;
-		order = get_order(bytes);
 
-		err = iommu_unmap(domain, start, order);
+		err = iommu_unmap(domain, start, bytes);
 		if (err < 0)
 			break;
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d084e87..1806956 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -61,6 +61,8 @@ struct iommu_ops {
 #ifdef CONFIG_IOMMU_API
 
 extern void register_iommu(struct iommu_ops *ops);
+extern void register_iommu_pgsize(struct iommu_ops *ops,
+		unsigned long *pgsize_bitmap, unsigned int nr_page_bits);
 extern bool iommu_found(void);
 extern struct iommu_domain *iommu_domain_alloc(void);
 extern void iommu_domain_free(struct iommu_domain *domain);
@@ -69,9 +71,9 @@ extern int iommu_attach_device(struct iommu_domain *domain,
 extern void iommu_detach_device(struct iommu_domain *domain,
 				struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
-		     phys_addr_t paddr, int gfp_order, int prot);
+		     phys_addr_t paddr, size_t size, int prot);
 extern int iommu_unmap(struct iommu_domain *domain, unsigned long iova,
-		       int gfp_order);
+		       size_t size);
 extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
 				      unsigned long iova);
 extern int iommu_domain_has_cap(struct iommu_domain *domain,
diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index 78c80f6..ea142d3 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -111,7 +111,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot *slot)
 
 		/* Map into IO address space */
 		r = iommu_map(domain, gfn_to_gpa(gfn), pfn_to_hpa(pfn),
-			      get_order(page_size), flags);
+			      page_size, flags);
 		if (r) {
 			printk(KERN_ERR "kvm_iommu_map_address:"
 			       "iommu failed to map pfn=%llx\n", pfn);
@@ -293,7 +293,7 @@ static void kvm_iommu_put_pages(struct kvm *kvm,
 		pfn  = phys >> PAGE_SHIFT;
 
 		/* Unmap address from IO address space */
-		order       = iommu_unmap(domain, gfn_to_gpa(gfn), 0);
+		order       = iommu_unmap(domain, gfn_to_gpa(gfn), PAGE_SIZE);
 		unmap_pages = 1ULL << order;
 
 		/* Unpin all pages we just unmapped to not leak any memory */
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-09-16 17:51   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-09-16 17:51 UTC (permalink / raw)
  To: iommu
  Cc: linux-omap, Hiroshi DOYU, Laurent Pinchart, Joerg Roedel,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Ohad Ben-Cohen, Stepan Moskovchenko, kvm

When mapping a memory region, split it to page sizes as supported
by the iommu hardware. Always prefer bigger pages, when possible,
in order to reduce the TLB pressure.

The logic to do that is now added to the IOMMU core, so neither the iommu
drivers themselves nor users of the IOMMU API have to duplicate it.

This allows a more lenient granularity of mappings; traditionally the
IOMMU API took 'order' (of a page) as a mapping size, and directly let
the low level iommu drivers handle the mapping, but now that the IOMMU
core can split arbitrary memory regions into pages, we can remove this
limitation, so users don't have to split those regions by themselves.

Currently the supported page sizes are advertised once and they then
remain static. That works well for OMAP (and seemingly MSM too) but
it would probably not fly with intel's hardware, where the page size
capabilities seem to have the potential to be different between
several DMA remapping devices. This limitation can be dealt with
later, if desired. For now, the existing IOMMU API behavior is retained
(see: "iommu/intel: announce supported page sizes").

As requested, register_iommu() isn't changed yet, so we can convert
the IOMMU drivers in subsequent patches, and after all the drivers
are converted, register_iommu will be changed (and the temporary
register_iommu_pgsize() will be removed).

Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted
to send the mapping size in bytes instead of in page order.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: David Brown <davidb@codeaurora.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <Joerg.Roedel@amd.com>
Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
Cc: Hiroshi DOYU <Hiroshi.DOYU@nokia.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: kvm@vger.kernel.org
---
 drivers/iommu/iommu.c      |  158 +++++++++++++++++++++++++++++++++++++++++---
 drivers/iommu/omap-iovmm.c |   12 +---
 include/linux/iommu.h      |    6 +-
 virt/kvm/iommu.c           |    4 +-
 4 files changed, 157 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index c68ff29..7c01c8c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -16,6 +16,8 @@
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
  */
 
+#define pr_fmt(fmt)    "%s: " fmt, __func__
+
 #include <linux/kernel.h>
 #include <linux/bug.h>
 #include <linux/types.h>
@@ -23,15 +25,73 @@
 #include <linux/slab.h>
 #include <linux/errno.h>
 #include <linux/iommu.h>
+#include <linux/bitmap.h>
 
 static struct iommu_ops *iommu_ops;
 
+/* bitmap of supported page sizes */
+static unsigned long *iommu_pgsize_bitmap;
+
+/* number of bits used to represent the supported pages */
+static unsigned int iommu_nr_page_bits;
+
+/* size of the smallest supported page (in bytes) */
+static unsigned int iommu_min_pagesz;
+
+/* bit number of the smallest supported page */
+static unsigned int iommu_min_page_idx;
+
+/**
+ * register_iommu() - register an IOMMU hardware
+ * @ops: iommu handlers
+ * @pgsize_bitmap: bitmap of page sizes supported by the hardware
+ * @nr_page_bits: size of @pgsize_bitmap (in bits)
+ *
+ * Note: this is a temporary function, which will be removed once
+ * all IOMMU drivers are converted. The only reason it exists is to
+ * allow splitting the pgsizes changes to several patches in order to ease
+ * the review.
+ */
+void register_iommu_pgsize(struct iommu_ops *ops, unsigned long *pgsize_bitmap,
+					unsigned int nr_page_bits)
+{
+	if (iommu_ops || iommu_pgsize_bitmap || !nr_page_bits)
+		BUG();
+
+	iommu_ops = ops;
+	iommu_pgsize_bitmap = pgsize_bitmap;
+	iommu_nr_page_bits = nr_page_bits;
+
+	/* find the minimum page size and its index only once */
+	iommu_min_page_idx = find_first_bit(pgsize_bitmap, nr_page_bits);
+	iommu_min_pagesz = 1 << iommu_min_page_idx;
+}
+
+/*
+ * default pagesize bitmap, will be removed once all IOMMU drivers
+ * are converted
+ */
+static unsigned long default_iommu_pgsizes = ~0xFFFUL;
+
 void register_iommu(struct iommu_ops *ops)
 {
 	if (iommu_ops)
 		BUG();
 
 	iommu_ops = ops;
+
+	/*
+	 * set default pgsize values, which retain the existing
+	 * IOMMU API behavior: drivers will be called to map
+	 * regions that are sized/aligned to order of 4KiB pages
+	 */
+	iommu_pgsize_bitmap = &default_iommu_pgsizes;
+	iommu_nr_page_bits = BITS_PER_LONG;
+
+	/* find the minimum page size and its index only once */
+	iommu_min_page_idx = find_first_bit(iommu_pgsize_bitmap,
+						iommu_nr_page_bits);
+	iommu_min_pagesz = 1 << iommu_min_page_idx;
 }
 
 bool iommu_found(void)
@@ -109,26 +169,104 @@ int iommu_domain_has_cap(struct iommu_domain *domain,
 EXPORT_SYMBOL_GPL(iommu_domain_has_cap);
 
 int iommu_map(struct iommu_domain *domain, unsigned long iova,
-	      phys_addr_t paddr, int gfp_order, int prot)
+	      phys_addr_t paddr, size_t size, int prot)
 {
-	size_t size;
+	int ret = 0;
+
+	/*
+	 * both the virtual address and the physical one, as well as
+	 * the size of the mapping, must be aligned (at least) to the
+	 * size of the smallest page supported by the hardware
+	 */
+	if (!IS_ALIGNED(iova | paddr | size, iommu_min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%lx min_pagesz "
+			"0x%x\n", iova, (unsigned long)paddr,
+			(unsigned long)size, iommu_min_pagesz);
+		return -EINVAL;
+	}
+
+	pr_debug("map: iova 0x%lx pa 0x%lx size 0x%lx\n", iova,
+				(unsigned long)paddr, (unsigned long)size);
+
+	while (size) {
+		unsigned long pgsize = iommu_min_pagesz;
+		unsigned long idx = iommu_min_page_idx;
+		unsigned long addr_merge = iova | paddr;
+		int order;
+
+		/* find the max page size with which iova, paddr are aligned */
+		for (;;) {
+			unsigned long try_pgsize;
+
+			idx = find_next_bit(iommu_pgsize_bitmap,
+						iommu_nr_page_bits, idx + 1);
+
+			/* no more pages to check ? */
+			if (idx >= iommu_nr_page_bits)
+				break;
+
+			try_pgsize = 1 << idx;
 
-	size         = 0x1000UL << gfp_order;
+			/* page too big ? addresses not aligned ? */
+			if (size < try_pgsize ||
+					!IS_ALIGNED(addr_merge, try_pgsize))
+				break;
 
-	BUG_ON(!IS_ALIGNED(iova | paddr, size));
+			pgsize = try_pgsize;
+		}
 
-	return iommu_ops->map(domain, iova, paddr, gfp_order, prot);
+		order = get_order(pgsize);
+
+		pr_debug("mapping: iova 0x%lx pa 0x%lx order %d\n", iova,
+					(unsigned long)paddr, order);
+
+		ret = iommu_ops->map(domain, iova, paddr, order, prot);
+		if (ret)
+			break;
+
+		size -= pgsize;
+		iova += pgsize;
+		paddr += pgsize;
+	}
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_map);
 
-int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
+int iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
 {
-	size_t size;
+	int order, unmapped_size, unmapped_order, total_unmapped = 0;
+
+	/*
+	 * The virtual address, as well as the size of the mapping, must be
+	 * aligned (at least) to the size of the smallest page supported
+	 * by the hardware
+	 */
+	if (!IS_ALIGNED(iova | size, iommu_min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx size 0x%lx min_pagesz 0x%x\n",
+				iova, (unsigned long)size, iommu_min_pagesz);
+		return -EINVAL;
+	}
+
+	pr_debug("unmap this: iova 0x%lx size 0x%lx\n", iova,
+							(unsigned long)size);
+
+	while (size > total_unmapped) {
+		order = get_order(size - total_unmapped);
+
+		unmapped_order = iommu_ops->unmap(domain, iova, order);
+		if (unmapped_order < 0)
+			return unmapped_order;
+
+		pr_debug("unmapped: iova 0x%lx order %d\n", iova,
+							unmapped_order);
 
-	size         = 0x1000UL << gfp_order;
+		unmapped_size = 0x1000UL << unmapped_order;
 
-	BUG_ON(!IS_ALIGNED(iova, size));
+		iova += unmapped_size;
+		total_unmapped += unmapped_size;
+	}
 
-	return iommu_ops->unmap(domain, iova, gfp_order);
+	return get_order(total_unmapped);
 }
 EXPORT_SYMBOL_GPL(iommu_unmap);
diff --git a/drivers/iommu/omap-iovmm.c b/drivers/iommu/omap-iovmm.c
index e8fdb88..f4dea5a 100644
--- a/drivers/iommu/omap-iovmm.c
+++ b/drivers/iommu/omap-iovmm.c
@@ -409,7 +409,6 @@ static int map_iovm_area(struct iommu_domain *domain, struct iovm_struct *new,
 	unsigned int i, j;
 	struct scatterlist *sg;
 	u32 da = new->da_start;
-	int order;
 
 	if (!domain || !sgt)
 		return -EINVAL;
@@ -428,12 +427,10 @@ static int map_iovm_area(struct iommu_domain *domain, struct iovm_struct *new,
 		if (bytes_to_iopgsz(bytes) < 0)
 			goto err_out;
 
-		order = get_order(bytes);
-
 		pr_debug("%s: [%d] %08x %08x(%x)\n", __func__,
 			 i, da, pa, bytes);
 
-		err = iommu_map(domain, da, pa, order, flags);
+		err = iommu_map(domain, da, pa, bytes, flags);
 		if (err)
 			goto err_out;
 
@@ -448,10 +445,9 @@ err_out:
 		size_t bytes;
 
 		bytes = sg->length + sg->offset;
-		order = get_order(bytes);
 
 		/* ignore failures.. we're already handling one */
-		iommu_unmap(domain, da, order);
+		iommu_unmap(domain, da, bytes);
 
 		da += bytes;
 	}
@@ -474,12 +470,10 @@ static void unmap_iovm_area(struct iommu_domain *domain, struct omap_iommu *obj,
 	start = area->da_start;
 	for_each_sg(sgt->sgl, sg, sgt->nents, i) {
 		size_t bytes;
-		int order;
 
 		bytes = sg->length + sg->offset;
-		order = get_order(bytes);
 
-		err = iommu_unmap(domain, start, order);
+		err = iommu_unmap(domain, start, bytes);
 		if (err < 0)
 			break;
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d084e87..1806956 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -61,6 +61,8 @@ struct iommu_ops {
 #ifdef CONFIG_IOMMU_API
 
 extern void register_iommu(struct iommu_ops *ops);
+extern void register_iommu_pgsize(struct iommu_ops *ops,
+		unsigned long *pgsize_bitmap, unsigned int nr_page_bits);
 extern bool iommu_found(void);
 extern struct iommu_domain *iommu_domain_alloc(void);
 extern void iommu_domain_free(struct iommu_domain *domain);
@@ -69,9 +71,9 @@ extern int iommu_attach_device(struct iommu_domain *domain,
 extern void iommu_detach_device(struct iommu_domain *domain,
 				struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
-		     phys_addr_t paddr, int gfp_order, int prot);
+		     phys_addr_t paddr, size_t size, int prot);
 extern int iommu_unmap(struct iommu_domain *domain, unsigned long iova,
-		       int gfp_order);
+		       size_t size);
 extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
 				      unsigned long iova);
 extern int iommu_domain_has_cap(struct iommu_domain *domain,
diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index 78c80f6..ea142d3 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -111,7 +111,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot *slot)
 
 		/* Map into IO address space */
 		r = iommu_map(domain, gfn_to_gpa(gfn), pfn_to_hpa(pfn),
-			      get_order(page_size), flags);
+			      page_size, flags);
 		if (r) {
 			printk(KERN_ERR "kvm_iommu_map_address:"
 			       "iommu failed to map pfn=%llx\n", pfn);
@@ -293,7 +293,7 @@ static void kvm_iommu_put_pages(struct kvm *kvm,
 		pfn  = phys >> PAGE_SHIFT;
 
 		/* Unmap address from IO address space */
-		order       = iommu_unmap(domain, gfn_to_gpa(gfn), 0);
+		order       = iommu_unmap(domain, gfn_to_gpa(gfn), PAGE_SIZE);
 		unmap_pages = 1ULL << order;
 
 		/* Unpin all pages we just unmapped to not leak any memory */
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-09-16 17:51   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-09-16 17:51 UTC (permalink / raw)
  To: linux-arm-kernel

When mapping a memory region, split it to page sizes as supported
by the iommu hardware. Always prefer bigger pages, when possible,
in order to reduce the TLB pressure.

The logic to do that is now added to the IOMMU core, so neither the iommu
drivers themselves nor users of the IOMMU API have to duplicate it.

This allows a more lenient granularity of mappings; traditionally the
IOMMU API took 'order' (of a page) as a mapping size, and directly let
the low level iommu drivers handle the mapping, but now that the IOMMU
core can split arbitrary memory regions into pages, we can remove this
limitation, so users don't have to split those regions by themselves.

Currently the supported page sizes are advertised once and they then
remain static. That works well for OMAP (and seemingly MSM too) but
it would probably not fly with intel's hardware, where the page size
capabilities seem to have the potential to be different between
several DMA remapping devices. This limitation can be dealt with
later, if desired. For now, the existing IOMMU API behavior is retained
(see: "iommu/intel: announce supported page sizes").

As requested, register_iommu() isn't changed yet, so we can convert
the IOMMU drivers in subsequent patches, and after all the drivers
are converted, register_iommu will be changed (and the temporary
register_iommu_pgsize() will be removed).

Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted
to send the mapping size in bytes instead of in page order.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: David Brown <davidb@codeaurora.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <Joerg.Roedel@amd.com>
Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
Cc: Hiroshi DOYU <Hiroshi.DOYU@nokia.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: kvm at vger.kernel.org
---
 drivers/iommu/iommu.c      |  158 +++++++++++++++++++++++++++++++++++++++++---
 drivers/iommu/omap-iovmm.c |   12 +---
 include/linux/iommu.h      |    6 +-
 virt/kvm/iommu.c           |    4 +-
 4 files changed, 157 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index c68ff29..7c01c8c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -16,6 +16,8 @@
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
  */
 
+#define pr_fmt(fmt)    "%s: " fmt, __func__
+
 #include <linux/kernel.h>
 #include <linux/bug.h>
 #include <linux/types.h>
@@ -23,15 +25,73 @@
 #include <linux/slab.h>
 #include <linux/errno.h>
 #include <linux/iommu.h>
+#include <linux/bitmap.h>
 
 static struct iommu_ops *iommu_ops;
 
+/* bitmap of supported page sizes */
+static unsigned long *iommu_pgsize_bitmap;
+
+/* number of bits used to represent the supported pages */
+static unsigned int iommu_nr_page_bits;
+
+/* size of the smallest supported page (in bytes) */
+static unsigned int iommu_min_pagesz;
+
+/* bit number of the smallest supported page */
+static unsigned int iommu_min_page_idx;
+
+/**
+ * register_iommu() - register an IOMMU hardware
+ * @ops: iommu handlers
+ * @pgsize_bitmap: bitmap of page sizes supported by the hardware
+ * @nr_page_bits: size of @pgsize_bitmap (in bits)
+ *
+ * Note: this is a temporary function, which will be removed once
+ * all IOMMU drivers are converted. The only reason it exists is to
+ * allow splitting the pgsizes changes to several patches in order to ease
+ * the review.
+ */
+void register_iommu_pgsize(struct iommu_ops *ops, unsigned long *pgsize_bitmap,
+					unsigned int nr_page_bits)
+{
+	if (iommu_ops || iommu_pgsize_bitmap || !nr_page_bits)
+		BUG();
+
+	iommu_ops = ops;
+	iommu_pgsize_bitmap = pgsize_bitmap;
+	iommu_nr_page_bits = nr_page_bits;
+
+	/* find the minimum page size and its index only once */
+	iommu_min_page_idx = find_first_bit(pgsize_bitmap, nr_page_bits);
+	iommu_min_pagesz = 1 << iommu_min_page_idx;
+}
+
+/*
+ * default pagesize bitmap, will be removed once all IOMMU drivers
+ * are converted
+ */
+static unsigned long default_iommu_pgsizes = ~0xFFFUL;
+
 void register_iommu(struct iommu_ops *ops)
 {
 	if (iommu_ops)
 		BUG();
 
 	iommu_ops = ops;
+
+	/*
+	 * set default pgsize values, which retain the existing
+	 * IOMMU API behavior: drivers will be called to map
+	 * regions that are sized/aligned to order of 4KiB pages
+	 */
+	iommu_pgsize_bitmap = &default_iommu_pgsizes;
+	iommu_nr_page_bits = BITS_PER_LONG;
+
+	/* find the minimum page size and its index only once */
+	iommu_min_page_idx = find_first_bit(iommu_pgsize_bitmap,
+						iommu_nr_page_bits);
+	iommu_min_pagesz = 1 << iommu_min_page_idx;
 }
 
 bool iommu_found(void)
@@ -109,26 +169,104 @@ int iommu_domain_has_cap(struct iommu_domain *domain,
 EXPORT_SYMBOL_GPL(iommu_domain_has_cap);
 
 int iommu_map(struct iommu_domain *domain, unsigned long iova,
-	      phys_addr_t paddr, int gfp_order, int prot)
+	      phys_addr_t paddr, size_t size, int prot)
 {
-	size_t size;
+	int ret = 0;
+
+	/*
+	 * both the virtual address and the physical one, as well as
+	 * the size of the mapping, must be aligned (at least) to the
+	 * size of the smallest page supported by the hardware
+	 */
+	if (!IS_ALIGNED(iova | paddr | size, iommu_min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%lx min_pagesz "
+			"0x%x\n", iova, (unsigned long)paddr,
+			(unsigned long)size, iommu_min_pagesz);
+		return -EINVAL;
+	}
+
+	pr_debug("map: iova 0x%lx pa 0x%lx size 0x%lx\n", iova,
+				(unsigned long)paddr, (unsigned long)size);
+
+	while (size) {
+		unsigned long pgsize = iommu_min_pagesz;
+		unsigned long idx = iommu_min_page_idx;
+		unsigned long addr_merge = iova | paddr;
+		int order;
+
+		/* find the max page size with which iova, paddr are aligned */
+		for (;;) {
+			unsigned long try_pgsize;
+
+			idx = find_next_bit(iommu_pgsize_bitmap,
+						iommu_nr_page_bits, idx + 1);
+
+			/* no more pages to check ? */
+			if (idx >= iommu_nr_page_bits)
+				break;
+
+			try_pgsize = 1 << idx;
 
-	size         = 0x1000UL << gfp_order;
+			/* page too big ? addresses not aligned ? */
+			if (size < try_pgsize ||
+					!IS_ALIGNED(addr_merge, try_pgsize))
+				break;
 
-	BUG_ON(!IS_ALIGNED(iova | paddr, size));
+			pgsize = try_pgsize;
+		}
 
-	return iommu_ops->map(domain, iova, paddr, gfp_order, prot);
+		order = get_order(pgsize);
+
+		pr_debug("mapping: iova 0x%lx pa 0x%lx order %d\n", iova,
+					(unsigned long)paddr, order);
+
+		ret = iommu_ops->map(domain, iova, paddr, order, prot);
+		if (ret)
+			break;
+
+		size -= pgsize;
+		iova += pgsize;
+		paddr += pgsize;
+	}
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_map);
 
-int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
+int iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
 {
-	size_t size;
+	int order, unmapped_size, unmapped_order, total_unmapped = 0;
+
+	/*
+	 * The virtual address, as well as the size of the mapping, must be
+	 * aligned (at least) to the size of the smallest page supported
+	 * by the hardware
+	 */
+	if (!IS_ALIGNED(iova | size, iommu_min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx size 0x%lx min_pagesz 0x%x\n",
+				iova, (unsigned long)size, iommu_min_pagesz);
+		return -EINVAL;
+	}
+
+	pr_debug("unmap this: iova 0x%lx size 0x%lx\n", iova,
+							(unsigned long)size);
+
+	while (size > total_unmapped) {
+		order = get_order(size - total_unmapped);
+
+		unmapped_order = iommu_ops->unmap(domain, iova, order);
+		if (unmapped_order < 0)
+			return unmapped_order;
+
+		pr_debug("unmapped: iova 0x%lx order %d\n", iova,
+							unmapped_order);
 
-	size         = 0x1000UL << gfp_order;
+		unmapped_size = 0x1000UL << unmapped_order;
 
-	BUG_ON(!IS_ALIGNED(iova, size));
+		iova += unmapped_size;
+		total_unmapped += unmapped_size;
+	}
 
-	return iommu_ops->unmap(domain, iova, gfp_order);
+	return get_order(total_unmapped);
 }
 EXPORT_SYMBOL_GPL(iommu_unmap);
diff --git a/drivers/iommu/omap-iovmm.c b/drivers/iommu/omap-iovmm.c
index e8fdb88..f4dea5a 100644
--- a/drivers/iommu/omap-iovmm.c
+++ b/drivers/iommu/omap-iovmm.c
@@ -409,7 +409,6 @@ static int map_iovm_area(struct iommu_domain *domain, struct iovm_struct *new,
 	unsigned int i, j;
 	struct scatterlist *sg;
 	u32 da = new->da_start;
-	int order;
 
 	if (!domain || !sgt)
 		return -EINVAL;
@@ -428,12 +427,10 @@ static int map_iovm_area(struct iommu_domain *domain, struct iovm_struct *new,
 		if (bytes_to_iopgsz(bytes) < 0)
 			goto err_out;
 
-		order = get_order(bytes);
-
 		pr_debug("%s: [%d] %08x %08x(%x)\n", __func__,
 			 i, da, pa, bytes);
 
-		err = iommu_map(domain, da, pa, order, flags);
+		err = iommu_map(domain, da, pa, bytes, flags);
 		if (err)
 			goto err_out;
 
@@ -448,10 +445,9 @@ err_out:
 		size_t bytes;
 
 		bytes = sg->length + sg->offset;
-		order = get_order(bytes);
 
 		/* ignore failures.. we're already handling one */
-		iommu_unmap(domain, da, order);
+		iommu_unmap(domain, da, bytes);
 
 		da += bytes;
 	}
@@ -474,12 +470,10 @@ static void unmap_iovm_area(struct iommu_domain *domain, struct omap_iommu *obj,
 	start = area->da_start;
 	for_each_sg(sgt->sgl, sg, sgt->nents, i) {
 		size_t bytes;
-		int order;
 
 		bytes = sg->length + sg->offset;
-		order = get_order(bytes);
 
-		err = iommu_unmap(domain, start, order);
+		err = iommu_unmap(domain, start, bytes);
 		if (err < 0)
 			break;
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d084e87..1806956 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -61,6 +61,8 @@ struct iommu_ops {
 #ifdef CONFIG_IOMMU_API
 
 extern void register_iommu(struct iommu_ops *ops);
+extern void register_iommu_pgsize(struct iommu_ops *ops,
+		unsigned long *pgsize_bitmap, unsigned int nr_page_bits);
 extern bool iommu_found(void);
 extern struct iommu_domain *iommu_domain_alloc(void);
 extern void iommu_domain_free(struct iommu_domain *domain);
@@ -69,9 +71,9 @@ extern int iommu_attach_device(struct iommu_domain *domain,
 extern void iommu_detach_device(struct iommu_domain *domain,
 				struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
-		     phys_addr_t paddr, int gfp_order, int prot);
+		     phys_addr_t paddr, size_t size, int prot);
 extern int iommu_unmap(struct iommu_domain *domain, unsigned long iova,
-		       int gfp_order);
+		       size_t size);
 extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
 				      unsigned long iova);
 extern int iommu_domain_has_cap(struct iommu_domain *domain,
diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index 78c80f6..ea142d3 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -111,7 +111,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot *slot)
 
 		/* Map into IO address space */
 		r = iommu_map(domain, gfn_to_gpa(gfn), pfn_to_hpa(pfn),
-			      get_order(page_size), flags);
+			      page_size, flags);
 		if (r) {
 			printk(KERN_ERR "kvm_iommu_map_address:"
 			       "iommu failed to map pfn=%llx\n", pfn);
@@ -293,7 +293,7 @@ static void kvm_iommu_put_pages(struct kvm *kvm,
 		pfn  = phys >> PAGE_SHIFT;
 
 		/* Unmap address from IO address space */
-		order       = iommu_unmap(domain, gfn_to_gpa(gfn), 0);
+		order       = iommu_unmap(domain, gfn_to_gpa(gfn), PAGE_SIZE);
 		unmap_pages = 1ULL << order;
 
 		/* Unpin all pages we just unmapped to not leak any memory */
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v3 2/6] iommu/omap: announce supported page sizes
  2011-09-16 17:51 ` Ohad Ben-Cohen
  (?)
@ 2011-09-16 17:51   ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-09-16 17:51 UTC (permalink / raw)
  To: iommu
  Cc: linux-omap, Hiroshi DOYU, Laurent Pinchart, Joerg Roedel,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Ohad Ben-Cohen

Let the IOMMU core know we support 4KiB, 64KiB, 1MiB and 16MiB page sizes.

This way the IOMMU core can split any arbitrary-sized physically
contiguous regions (that it needs to map) as needed.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Hiroshi DOYU <Hiroshi.DOYU@nokia.com>
---
 drivers/iommu/omap-iommu.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index 7e0188f..403dd6a 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -1202,6 +1202,9 @@ static int omap_iommu_domain_has_cap(struct iommu_domain *domain,
 	return 0;
 }
 
+/* bitmap of the page sizes supported by the OMAP IOMMU hardware */
+static unsigned long omap_iommu_pgsizes = SZ_4K | SZ_64K | SZ_1M | SZ_16M;
+
 static struct iommu_ops omap_iommu_ops = {
 	.domain_init	= omap_iommu_domain_init,
 	.domain_destroy	= omap_iommu_domain_destroy,
@@ -1225,7 +1228,8 @@ static int __init omap_iommu_init(void)
 		return -ENOMEM;
 	iopte_cachep = p;
 
-	register_iommu(&omap_iommu_ops);
+	/* we're only using the first 25 bits of the pgsizes bitmap */
+	register_iommu_pgsize(&omap_iommu_ops, &omap_iommu_pgsizes, 25);
 
 	return platform_driver_register(&omap_iommu_driver);
 }
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v3 2/6] iommu/omap: announce supported page sizes
@ 2011-09-16 17:51   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-09-16 17:51 UTC (permalink / raw)
  To: iommu
  Cc: linux-omap, Hiroshi DOYU, Laurent Pinchart, Joerg Roedel,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Ohad Ben-Cohen

Let the IOMMU core know we support 4KiB, 64KiB, 1MiB and 16MiB page sizes.

This way the IOMMU core can split any arbitrary-sized physically
contiguous regions (that it needs to map) as needed.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Hiroshi DOYU <Hiroshi.DOYU@nokia.com>
---
 drivers/iommu/omap-iommu.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index 7e0188f..403dd6a 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -1202,6 +1202,9 @@ static int omap_iommu_domain_has_cap(struct iommu_domain *domain,
 	return 0;
 }
 
+/* bitmap of the page sizes supported by the OMAP IOMMU hardware */
+static unsigned long omap_iommu_pgsizes = SZ_4K | SZ_64K | SZ_1M | SZ_16M;
+
 static struct iommu_ops omap_iommu_ops = {
 	.domain_init	= omap_iommu_domain_init,
 	.domain_destroy	= omap_iommu_domain_destroy,
@@ -1225,7 +1228,8 @@ static int __init omap_iommu_init(void)
 		return -ENOMEM;
 	iopte_cachep = p;
 
-	register_iommu(&omap_iommu_ops);
+	/* we're only using the first 25 bits of the pgsizes bitmap */
+	register_iommu_pgsize(&omap_iommu_ops, &omap_iommu_pgsizes, 25);
 
 	return platform_driver_register(&omap_iommu_driver);
 }
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v3 2/6] iommu/omap: announce supported page sizes
@ 2011-09-16 17:51   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-09-16 17:51 UTC (permalink / raw)
  To: linux-arm-kernel

Let the IOMMU core know we support 4KiB, 64KiB, 1MiB and 16MiB page sizes.

This way the IOMMU core can split any arbitrary-sized physically
contiguous regions (that it needs to map) as needed.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Hiroshi DOYU <Hiroshi.DOYU@nokia.com>
---
 drivers/iommu/omap-iommu.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index 7e0188f..403dd6a 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -1202,6 +1202,9 @@ static int omap_iommu_domain_has_cap(struct iommu_domain *domain,
 	return 0;
 }
 
+/* bitmap of the page sizes supported by the OMAP IOMMU hardware */
+static unsigned long omap_iommu_pgsizes = SZ_4K | SZ_64K | SZ_1M | SZ_16M;
+
 static struct iommu_ops omap_iommu_ops = {
 	.domain_init	= omap_iommu_domain_init,
 	.domain_destroy	= omap_iommu_domain_destroy,
@@ -1225,7 +1228,8 @@ static int __init omap_iommu_init(void)
 		return -ENOMEM;
 	iopte_cachep = p;
 
-	register_iommu(&omap_iommu_ops);
+	/* we're only using the first 25 bits of the pgsizes bitmap */
+	register_iommu_pgsize(&omap_iommu_ops, &omap_iommu_pgsizes, 25);
 
 	return platform_driver_register(&omap_iommu_driver);
 }
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v3 3/6] iommu/msm: announce supported page sizes
  2011-09-16 17:51 ` Ohad Ben-Cohen
  (?)
@ 2011-09-16 17:51   ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-09-16 17:51 UTC (permalink / raw)
  To: iommu
  Cc: linux-omap, Hiroshi DOYU, Laurent Pinchart, Joerg Roedel,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Ohad Ben-Cohen, Stepan Moskovchenko

Let the IOMMU core know we support 4KiB, 64KiB, 1MiB and 16MiB page sizes.

This way the IOMMU core can split any arbitrary-sized physically
contiguous regions (that it needs to map) as needed.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: David Brown <davidb@codeaurora.org>
Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
---
 drivers/iommu/msm_iommu.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index d1733f6..a4ed116 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -676,6 +676,9 @@ fail:
 	return 0;
 }
 
+/* bitmap of the page sizes currently supported */
+static unsigned long msm_iommu_pgsizes = SZ_4K | SZ_64K | SZ_1M | SZ_16M;
+
 static struct iommu_ops msm_iommu_ops = {
 	.domain_init = msm_iommu_domain_init,
 	.domain_destroy = msm_iommu_domain_destroy,
@@ -728,7 +731,10 @@ static void __init setup_iommu_tex_classes(void)
 static int __init msm_iommu_init(void)
 {
 	setup_iommu_tex_classes();
-	register_iommu(&msm_iommu_ops);
+
+	/* we're only using the first 25 bits of the pgsizes bitmap */
+	register_iommu_pgsize(&msm_iommu_ops, &msm_iommu_pgsizes, 25);
+
 	return 0;
 }
 
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v3 3/6] iommu/msm: announce supported page sizes
@ 2011-09-16 17:51   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-09-16 17:51 UTC (permalink / raw)
  To: iommu
  Cc: linux-omap, Hiroshi DOYU, Laurent Pinchart, Joerg Roedel,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Ohad Ben-Cohen, Stepan Moskovchenko

Let the IOMMU core know we support 4KiB, 64KiB, 1MiB and 16MiB page sizes.

This way the IOMMU core can split any arbitrary-sized physically
contiguous regions (that it needs to map) as needed.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: David Brown <davidb@codeaurora.org>
Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
---
 drivers/iommu/msm_iommu.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index d1733f6..a4ed116 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -676,6 +676,9 @@ fail:
 	return 0;
 }
 
+/* bitmap of the page sizes currently supported */
+static unsigned long msm_iommu_pgsizes = SZ_4K | SZ_64K | SZ_1M | SZ_16M;
+
 static struct iommu_ops msm_iommu_ops = {
 	.domain_init = msm_iommu_domain_init,
 	.domain_destroy = msm_iommu_domain_destroy,
@@ -728,7 +731,10 @@ static void __init setup_iommu_tex_classes(void)
 static int __init msm_iommu_init(void)
 {
 	setup_iommu_tex_classes();
-	register_iommu(&msm_iommu_ops);
+
+	/* we're only using the first 25 bits of the pgsizes bitmap */
+	register_iommu_pgsize(&msm_iommu_ops, &msm_iommu_pgsizes, 25);
+
 	return 0;
 }
 
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v3 3/6] iommu/msm: announce supported page sizes
@ 2011-09-16 17:51   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-09-16 17:51 UTC (permalink / raw)
  To: linux-arm-kernel

Let the IOMMU core know we support 4KiB, 64KiB, 1MiB and 16MiB page sizes.

This way the IOMMU core can split any arbitrary-sized physically
contiguous regions (that it needs to map) as needed.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: David Brown <davidb@codeaurora.org>
Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
---
 drivers/iommu/msm_iommu.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index d1733f6..a4ed116 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -676,6 +676,9 @@ fail:
 	return 0;
 }
 
+/* bitmap of the page sizes currently supported */
+static unsigned long msm_iommu_pgsizes = SZ_4K | SZ_64K | SZ_1M | SZ_16M;
+
 static struct iommu_ops msm_iommu_ops = {
 	.domain_init = msm_iommu_domain_init,
 	.domain_destroy = msm_iommu_domain_destroy,
@@ -728,7 +731,10 @@ static void __init setup_iommu_tex_classes(void)
 static int __init msm_iommu_init(void)
 {
 	setup_iommu_tex_classes();
-	register_iommu(&msm_iommu_ops);
+
+	/* we're only using the first 25 bits of the pgsizes bitmap */
+	register_iommu_pgsize(&msm_iommu_ops, &msm_iommu_pgsizes, 25);
+
 	return 0;
 }
 
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v3 4/6] iommu/amd: announce supported page sizes
  2011-09-16 17:51 ` Ohad Ben-Cohen
  (?)
@ 2011-09-16 17:51   ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-09-16 17:51 UTC (permalink / raw)
  To: iommu
  Cc: linux-omap, Hiroshi DOYU, Laurent Pinchart, Joerg Roedel,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Ohad Ben-Cohen

Let the IOMMU core know we support arbitrary page sizes (as long as
they're an order of 4KiB).

This way the IOMMU core will retain the existing behavior we're used to;
it will let us map regions that:
- their size is an order of 4KiB
- they are naturally aligned

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Joerg Roedel <Joerg.Roedel@amd.com>
---
 drivers/iommu/amd_iommu.c |   21 ++++++++++++++++++++-
 1 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index a14f8dc..80191d2 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2488,12 +2488,31 @@ static unsigned device_dma_ops_init(void)
 }
 
 /*
+ * This bitmap is used to advertise the page sizes our hardware support
+ * to the IOMMU core, which will then use this information to split
+ * physically contiguous memory regions it is mapping into page sizes
+ * that we support.
+ *
+ * Traditionally the IOMMU core just handed us the mappings directly,
+ * after making sure the size is an order of a 4KiB page and that the
+ * mapping has natural alignment.
+ *
+ * To retain this behavior, we currently advertise that we support
+ * all page sizes that are an order of 4KiB.
+ *
+ * If at some point we'd like to utilize the IOMMU core's new behavior,
+ * we could change this to advertise the real page sizes we support.
+ */
+static unsigned long amd_iommu_pgsizes = ~0xFFFUL;
+
+/*
  * The function which clues the AMD IOMMU driver into dma_ops.
  */
 
 void __init amd_iommu_init_api(void)
 {
-	register_iommu(&amd_iommu_ops);
+	register_iommu_pgsize(&amd_iommu_ops, &amd_iommu_pgsizes,
+						BITS_PER_LONG);
 }
 
 int __init amd_iommu_init_dma_ops(void)
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v3 4/6] iommu/amd: announce supported page sizes
@ 2011-09-16 17:51   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-09-16 17:51 UTC (permalink / raw)
  To: iommu
  Cc: linux-omap, Hiroshi DOYU, Laurent Pinchart, Joerg Roedel,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Ohad Ben-Cohen

Let the IOMMU core know we support arbitrary page sizes (as long as
they're an order of 4KiB).

This way the IOMMU core will retain the existing behavior we're used to;
it will let us map regions that:
- their size is an order of 4KiB
- they are naturally aligned

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Joerg Roedel <Joerg.Roedel@amd.com>
---
 drivers/iommu/amd_iommu.c |   21 ++++++++++++++++++++-
 1 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index a14f8dc..80191d2 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2488,12 +2488,31 @@ static unsigned device_dma_ops_init(void)
 }
 
 /*
+ * This bitmap is used to advertise the page sizes our hardware support
+ * to the IOMMU core, which will then use this information to split
+ * physically contiguous memory regions it is mapping into page sizes
+ * that we support.
+ *
+ * Traditionally the IOMMU core just handed us the mappings directly,
+ * after making sure the size is an order of a 4KiB page and that the
+ * mapping has natural alignment.
+ *
+ * To retain this behavior, we currently advertise that we support
+ * all page sizes that are an order of 4KiB.
+ *
+ * If at some point we'd like to utilize the IOMMU core's new behavior,
+ * we could change this to advertise the real page sizes we support.
+ */
+static unsigned long amd_iommu_pgsizes = ~0xFFFUL;
+
+/*
  * The function which clues the AMD IOMMU driver into dma_ops.
  */
 
 void __init amd_iommu_init_api(void)
 {
-	register_iommu(&amd_iommu_ops);
+	register_iommu_pgsize(&amd_iommu_ops, &amd_iommu_pgsizes,
+						BITS_PER_LONG);
 }
 
 int __init amd_iommu_init_dma_ops(void)
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v3 4/6] iommu/amd: announce supported page sizes
@ 2011-09-16 17:51   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-09-16 17:51 UTC (permalink / raw)
  To: linux-arm-kernel

Let the IOMMU core know we support arbitrary page sizes (as long as
they're an order of 4KiB).

This way the IOMMU core will retain the existing behavior we're used to;
it will let us map regions that:
- their size is an order of 4KiB
- they are naturally aligned

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Joerg Roedel <Joerg.Roedel@amd.com>
---
 drivers/iommu/amd_iommu.c |   21 ++++++++++++++++++++-
 1 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index a14f8dc..80191d2 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2488,12 +2488,31 @@ static unsigned device_dma_ops_init(void)
 }
 
 /*
+ * This bitmap is used to advertise the page sizes our hardware support
+ * to the IOMMU core, which will then use this information to split
+ * physically contiguous memory regions it is mapping into page sizes
+ * that we support.
+ *
+ * Traditionally the IOMMU core just handed us the mappings directly,
+ * after making sure the size is an order of a 4KiB page and that the
+ * mapping has natural alignment.
+ *
+ * To retain this behavior, we currently advertise that we support
+ * all page sizes that are an order of 4KiB.
+ *
+ * If at some point we'd like to utilize the IOMMU core's new behavior,
+ * we could change this to advertise the real page sizes we support.
+ */
+static unsigned long amd_iommu_pgsizes = ~0xFFFUL;
+
+/*
  * The function which clues the AMD IOMMU driver into dma_ops.
  */
 
 void __init amd_iommu_init_api(void)
 {
-	register_iommu(&amd_iommu_ops);
+	register_iommu_pgsize(&amd_iommu_ops, &amd_iommu_pgsizes,
+						BITS_PER_LONG);
 }
 
 int __init amd_iommu_init_dma_ops(void)
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v3 5/6] iommu/intel: announce supported page sizes
  2011-09-16 17:51 ` Ohad Ben-Cohen
  (?)
@ 2011-09-16 17:51   ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-09-16 17:51 UTC (permalink / raw)
  To: iommu
  Cc: linux-omap, Hiroshi DOYU, Laurent Pinchart, Joerg Roedel,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Ohad Ben-Cohen

Let the IOMMU core know we support arbitrary page sizes (as long as
they're an order of 4KiB).

This way the IOMMU core will retain the existing behavior we're used to;
it will let us map regions that:
- their size is an order of 4KiB
- they are naturally aligned

Note: Intel IOMMU hardware doesn't support arbitrary page sizes,
but the driver does (it splits arbitrary-sized mappings into
the pages supported by the hardware).

To make everything simpler for now, though, this patch effectively tells
the IOMMU core to keep giving this driver the same memory regions it did
before, so nothing is changed as far as it's concerned.

At this point, the page sizes announced remain static within the IOMMU
core. To correctly utilize the pgsize-splitting of the IOMMU core by
this driver, it seems that some core changes should still be done,
because Intel's IOMMU page size capabilities seem to have the potential
to be different between different DMA remapping devices.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: David Woodhouse <dwmw2@infradead.org>
---
 drivers/iommu/intel-iommu.c |   21 ++++++++++++++++++++-
 1 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index c621c98..46c21a2 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3426,6 +3426,24 @@ static struct notifier_block device_nb = {
 	.notifier_call = device_notifier,
 };
 
+/*
+ * This bitmap is used to advertise the page sizes our hardware support
+ * to the IOMMU core, which will then use this information to split
+ * physically contiguous memory regions it is mapping into page sizes
+ * that we support.
+ *
+ * Traditionally the IOMMU core just handed us the mappings directly,
+ * after making sure the size is an order of a 4KiB page and that the
+ * mapping has natural alignment.
+ *
+ * To retain this behavior, we currently advertise that we support
+ * all page sizes that are an order of 4KiB.
+ *
+ * If at some point we'd like to utilize the IOMMU core's new behavior,
+ * we could change this to advertise the real page sizes we support.
+ */
+static unsigned long intel_iommu_pgsizes = ~0xFFFUL;
+
 int __init intel_iommu_init(void)
 {
 	int ret = 0;
@@ -3486,7 +3504,8 @@ int __init intel_iommu_init(void)
 
 	init_iommu_pm_ops();
 
-	register_iommu(&intel_iommu_ops);
+	register_iommu_pgsize(&intel_iommu_ops, &intel_iommu_pgsizes,
+							BITS_PER_LONG);
 
 	bus_register_notifier(&pci_bus_type, &device_nb);
 
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v3 5/6] iommu/intel: announce supported page sizes
@ 2011-09-16 17:51   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-09-16 17:51 UTC (permalink / raw)
  To: iommu
  Cc: linux-omap, Hiroshi DOYU, Laurent Pinchart, Joerg Roedel,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Ohad Ben-Cohen

Let the IOMMU core know we support arbitrary page sizes (as long as
they're an order of 4KiB).

This way the IOMMU core will retain the existing behavior we're used to;
it will let us map regions that:
- their size is an order of 4KiB
- they are naturally aligned

Note: Intel IOMMU hardware doesn't support arbitrary page sizes,
but the driver does (it splits arbitrary-sized mappings into
the pages supported by the hardware).

To make everything simpler for now, though, this patch effectively tells
the IOMMU core to keep giving this driver the same memory regions it did
before, so nothing is changed as far as it's concerned.

At this point, the page sizes announced remain static within the IOMMU
core. To correctly utilize the pgsize-splitting of the IOMMU core by
this driver, it seems that some core changes should still be done,
because Intel's IOMMU page size capabilities seem to have the potential
to be different between different DMA remapping devices.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: David Woodhouse <dwmw2@infradead.org>
---
 drivers/iommu/intel-iommu.c |   21 ++++++++++++++++++++-
 1 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index c621c98..46c21a2 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3426,6 +3426,24 @@ static struct notifier_block device_nb = {
 	.notifier_call = device_notifier,
 };
 
+/*
+ * This bitmap is used to advertise the page sizes our hardware support
+ * to the IOMMU core, which will then use this information to split
+ * physically contiguous memory regions it is mapping into page sizes
+ * that we support.
+ *
+ * Traditionally the IOMMU core just handed us the mappings directly,
+ * after making sure the size is an order of a 4KiB page and that the
+ * mapping has natural alignment.
+ *
+ * To retain this behavior, we currently advertise that we support
+ * all page sizes that are an order of 4KiB.
+ *
+ * If at some point we'd like to utilize the IOMMU core's new behavior,
+ * we could change this to advertise the real page sizes we support.
+ */
+static unsigned long intel_iommu_pgsizes = ~0xFFFUL;
+
 int __init intel_iommu_init(void)
 {
 	int ret = 0;
@@ -3486,7 +3504,8 @@ int __init intel_iommu_init(void)
 
 	init_iommu_pm_ops();
 
-	register_iommu(&intel_iommu_ops);
+	register_iommu_pgsize(&intel_iommu_ops, &intel_iommu_pgsizes,
+							BITS_PER_LONG);
 
 	bus_register_notifier(&pci_bus_type, &device_nb);
 
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v3 5/6] iommu/intel: announce supported page sizes
@ 2011-09-16 17:51   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-09-16 17:51 UTC (permalink / raw)
  To: linux-arm-kernel

Let the IOMMU core know we support arbitrary page sizes (as long as
they're an order of 4KiB).

This way the IOMMU core will retain the existing behavior we're used to;
it will let us map regions that:
- their size is an order of 4KiB
- they are naturally aligned

Note: Intel IOMMU hardware doesn't support arbitrary page sizes,
but the driver does (it splits arbitrary-sized mappings into
the pages supported by the hardware).

To make everything simpler for now, though, this patch effectively tells
the IOMMU core to keep giving this driver the same memory regions it did
before, so nothing is changed as far as it's concerned.

At this point, the page sizes announced remain static within the IOMMU
core. To correctly utilize the pgsize-splitting of the IOMMU core by
this driver, it seems that some core changes should still be done,
because Intel's IOMMU page size capabilities seem to have the potential
to be different between different DMA remapping devices.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: David Woodhouse <dwmw2@infradead.org>
---
 drivers/iommu/intel-iommu.c |   21 ++++++++++++++++++++-
 1 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index c621c98..46c21a2 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3426,6 +3426,24 @@ static struct notifier_block device_nb = {
 	.notifier_call = device_notifier,
 };
 
+/*
+ * This bitmap is used to advertise the page sizes our hardware support
+ * to the IOMMU core, which will then use this information to split
+ * physically contiguous memory regions it is mapping into page sizes
+ * that we support.
+ *
+ * Traditionally the IOMMU core just handed us the mappings directly,
+ * after making sure the size is an order of a 4KiB page and that the
+ * mapping has natural alignment.
+ *
+ * To retain this behavior, we currently advertise that we support
+ * all page sizes that are an order of 4KiB.
+ *
+ * If at some point we'd like to utilize the IOMMU core's new behavior,
+ * we could change this to advertise the real page sizes we support.
+ */
+static unsigned long intel_iommu_pgsizes = ~0xFFFUL;
+
 int __init intel_iommu_init(void)
 {
 	int ret = 0;
@@ -3486,7 +3504,8 @@ int __init intel_iommu_init(void)
 
 	init_iommu_pm_ops();
 
-	register_iommu(&intel_iommu_ops);
+	register_iommu_pgsize(&intel_iommu_ops, &intel_iommu_pgsizes,
+							BITS_PER_LONG);
 
 	bus_register_notifier(&pci_bus_type, &device_nb);
 
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v3 6/6] iommu/core: remove the temporary register_iommu_pgsize API
  2011-09-16 17:51 ` Ohad Ben-Cohen
  (?)
@ 2011-09-16 17:51   ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-09-16 17:51 UTC (permalink / raw)
  To: iommu
  Cc: linux-omap, Hiroshi DOYU, Laurent Pinchart, Joerg Roedel,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Ohad Ben-Cohen, Stepan Moskovchenko

Now that all IOMMU drivers are converted to the new
register_iommu_pgsize() API, the old code can be removed, and
we can s/register_iommu_pgsize/register_iommu/.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Joerg Roedel <Joerg.Roedel@amd.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: David Brown <davidb@codeaurora.org>
Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
---
 drivers/iommu/amd_iommu.c   |    3 +--
 drivers/iommu/intel-iommu.c |    3 +--
 drivers/iommu/iommu.c       |   34 +---------------------------------
 drivers/iommu/msm_iommu.c   |    2 +-
 drivers/iommu/omap-iommu.c  |    2 +-
 include/linux/iommu.h       |    5 ++---
 6 files changed, 7 insertions(+), 42 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 80191d2..cf29645 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2511,8 +2511,7 @@ static unsigned long amd_iommu_pgsizes = ~0xFFFUL;
 
 void __init amd_iommu_init_api(void)
 {
-	register_iommu_pgsize(&amd_iommu_ops, &amd_iommu_pgsizes,
-						BITS_PER_LONG);
+	register_iommu(&amd_iommu_ops, &amd_iommu_pgsizes, BITS_PER_LONG);
 }
 
 int __init amd_iommu_init_dma_ops(void)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 46c21a2..c013b85 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3504,8 +3504,7 @@ int __init intel_iommu_init(void)
 
 	init_iommu_pm_ops();
 
-	register_iommu_pgsize(&intel_iommu_ops, &intel_iommu_pgsizes,
-							BITS_PER_LONG);
+	register_iommu(&intel_iommu_ops, &intel_iommu_pgsizes, BITS_PER_LONG);
 
 	bus_register_notifier(&pci_bus_type, &device_nb);
 
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 7c01c8c..8bbd1aa 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -46,13 +46,8 @@ static unsigned int iommu_min_page_idx;
  * @ops: iommu handlers
  * @pgsize_bitmap: bitmap of page sizes supported by the hardware
  * @nr_page_bits: size of @pgsize_bitmap (in bits)
- *
- * Note: this is a temporary function, which will be removed once
- * all IOMMU drivers are converted. The only reason it exists is to
- * allow splitting the pgsizes changes to several patches in order to ease
- * the review.
  */
-void register_iommu_pgsize(struct iommu_ops *ops, unsigned long *pgsize_bitmap,
+void register_iommu(struct iommu_ops *ops, unsigned long *pgsize_bitmap,
 					unsigned int nr_page_bits)
 {
 	if (iommu_ops || iommu_pgsize_bitmap || !nr_page_bits)
@@ -67,33 +62,6 @@ void register_iommu_pgsize(struct iommu_ops *ops, unsigned long *pgsize_bitmap,
 	iommu_min_pagesz = 1 << iommu_min_page_idx;
 }
 
-/*
- * default pagesize bitmap, will be removed once all IOMMU drivers
- * are converted
- */
-static unsigned long default_iommu_pgsizes = ~0xFFFUL;
-
-void register_iommu(struct iommu_ops *ops)
-{
-	if (iommu_ops)
-		BUG();
-
-	iommu_ops = ops;
-
-	/*
-	 * set default pgsize values, which retain the existing
-	 * IOMMU API behavior: drivers will be called to map
-	 * regions that are sized/aligned to order of 4KiB pages
-	 */
-	iommu_pgsize_bitmap = &default_iommu_pgsizes;
-	iommu_nr_page_bits = BITS_PER_LONG;
-
-	/* find the minimum page size and its index only once */
-	iommu_min_page_idx = find_first_bit(iommu_pgsize_bitmap,
-						iommu_nr_page_bits);
-	iommu_min_pagesz = 1 << iommu_min_page_idx;
-}
-
 bool iommu_found(void)
 {
 	return iommu_ops != NULL;
diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index a4ed116..e59ced9 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -733,7 +733,7 @@ static int __init msm_iommu_init(void)
 	setup_iommu_tex_classes();
 
 	/* we're only using the first 25 bits of the pgsizes bitmap */
-	register_iommu_pgsize(&msm_iommu_ops, &msm_iommu_pgsizes, 25);
+	register_iommu(&msm_iommu_ops, &msm_iommu_pgsizes, 25);
 
 	return 0;
 }
diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index 403dd6a..3d8ad87 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -1229,7 +1229,7 @@ static int __init omap_iommu_init(void)
 	iopte_cachep = p;
 
 	/* we're only using the first 25 bits of the pgsizes bitmap */
-	register_iommu_pgsize(&omap_iommu_ops, &omap_iommu_pgsizes, 25);
+	register_iommu(&omap_iommu_ops, &omap_iommu_pgsizes, 25);
 
 	return platform_driver_register(&omap_iommu_driver);
 }
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 1806956..297893f 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -60,9 +60,8 @@ struct iommu_ops {
 
 #ifdef CONFIG_IOMMU_API
 
-extern void register_iommu(struct iommu_ops *ops);
-extern void register_iommu_pgsize(struct iommu_ops *ops,
-		unsigned long *pgsize_bitmap, unsigned int nr_page_bits);
+extern void register_iommu(struct iommu_ops *ops, unsigned long *pgsize_bitmap,
+					unsigned int nr_page_bits);
 extern bool iommu_found(void);
 extern struct iommu_domain *iommu_domain_alloc(void);
 extern void iommu_domain_free(struct iommu_domain *domain);
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v3 6/6] iommu/core: remove the temporary register_iommu_pgsize API
@ 2011-09-16 17:51   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-09-16 17:51 UTC (permalink / raw)
  To: iommu
  Cc: linux-omap, Hiroshi DOYU, Laurent Pinchart, Joerg Roedel,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Ohad Ben-Cohen, Stepan Moskovchenko

Now that all IOMMU drivers are converted to the new
register_iommu_pgsize() API, the old code can be removed, and
we can s/register_iommu_pgsize/register_iommu/.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Joerg Roedel <Joerg.Roedel@amd.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: David Brown <davidb@codeaurora.org>
Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
---
 drivers/iommu/amd_iommu.c   |    3 +--
 drivers/iommu/intel-iommu.c |    3 +--
 drivers/iommu/iommu.c       |   34 +---------------------------------
 drivers/iommu/msm_iommu.c   |    2 +-
 drivers/iommu/omap-iommu.c  |    2 +-
 include/linux/iommu.h       |    5 ++---
 6 files changed, 7 insertions(+), 42 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 80191d2..cf29645 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2511,8 +2511,7 @@ static unsigned long amd_iommu_pgsizes = ~0xFFFUL;
 
 void __init amd_iommu_init_api(void)
 {
-	register_iommu_pgsize(&amd_iommu_ops, &amd_iommu_pgsizes,
-						BITS_PER_LONG);
+	register_iommu(&amd_iommu_ops, &amd_iommu_pgsizes, BITS_PER_LONG);
 }
 
 int __init amd_iommu_init_dma_ops(void)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 46c21a2..c013b85 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3504,8 +3504,7 @@ int __init intel_iommu_init(void)
 
 	init_iommu_pm_ops();
 
-	register_iommu_pgsize(&intel_iommu_ops, &intel_iommu_pgsizes,
-							BITS_PER_LONG);
+	register_iommu(&intel_iommu_ops, &intel_iommu_pgsizes, BITS_PER_LONG);
 
 	bus_register_notifier(&pci_bus_type, &device_nb);
 
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 7c01c8c..8bbd1aa 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -46,13 +46,8 @@ static unsigned int iommu_min_page_idx;
  * @ops: iommu handlers
  * @pgsize_bitmap: bitmap of page sizes supported by the hardware
  * @nr_page_bits: size of @pgsize_bitmap (in bits)
- *
- * Note: this is a temporary function, which will be removed once
- * all IOMMU drivers are converted. The only reason it exists is to
- * allow splitting the pgsizes changes to several patches in order to ease
- * the review.
  */
-void register_iommu_pgsize(struct iommu_ops *ops, unsigned long *pgsize_bitmap,
+void register_iommu(struct iommu_ops *ops, unsigned long *pgsize_bitmap,
 					unsigned int nr_page_bits)
 {
 	if (iommu_ops || iommu_pgsize_bitmap || !nr_page_bits)
@@ -67,33 +62,6 @@ void register_iommu_pgsize(struct iommu_ops *ops, unsigned long *pgsize_bitmap,
 	iommu_min_pagesz = 1 << iommu_min_page_idx;
 }
 
-/*
- * default pagesize bitmap, will be removed once all IOMMU drivers
- * are converted
- */
-static unsigned long default_iommu_pgsizes = ~0xFFFUL;
-
-void register_iommu(struct iommu_ops *ops)
-{
-	if (iommu_ops)
-		BUG();
-
-	iommu_ops = ops;
-
-	/*
-	 * set default pgsize values, which retain the existing
-	 * IOMMU API behavior: drivers will be called to map
-	 * regions that are sized/aligned to order of 4KiB pages
-	 */
-	iommu_pgsize_bitmap = &default_iommu_pgsizes;
-	iommu_nr_page_bits = BITS_PER_LONG;
-
-	/* find the minimum page size and its index only once */
-	iommu_min_page_idx = find_first_bit(iommu_pgsize_bitmap,
-						iommu_nr_page_bits);
-	iommu_min_pagesz = 1 << iommu_min_page_idx;
-}
-
 bool iommu_found(void)
 {
 	return iommu_ops != NULL;
diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index a4ed116..e59ced9 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -733,7 +733,7 @@ static int __init msm_iommu_init(void)
 	setup_iommu_tex_classes();
 
 	/* we're only using the first 25 bits of the pgsizes bitmap */
-	register_iommu_pgsize(&msm_iommu_ops, &msm_iommu_pgsizes, 25);
+	register_iommu(&msm_iommu_ops, &msm_iommu_pgsizes, 25);
 
 	return 0;
 }
diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index 403dd6a..3d8ad87 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -1229,7 +1229,7 @@ static int __init omap_iommu_init(void)
 	iopte_cachep = p;
 
 	/* we're only using the first 25 bits of the pgsizes bitmap */
-	register_iommu_pgsize(&omap_iommu_ops, &omap_iommu_pgsizes, 25);
+	register_iommu(&omap_iommu_ops, &omap_iommu_pgsizes, 25);
 
 	return platform_driver_register(&omap_iommu_driver);
 }
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 1806956..297893f 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -60,9 +60,8 @@ struct iommu_ops {
 
 #ifdef CONFIG_IOMMU_API
 
-extern void register_iommu(struct iommu_ops *ops);
-extern void register_iommu_pgsize(struct iommu_ops *ops,
-		unsigned long *pgsize_bitmap, unsigned int nr_page_bits);
+extern void register_iommu(struct iommu_ops *ops, unsigned long *pgsize_bitmap,
+					unsigned int nr_page_bits);
 extern bool iommu_found(void);
 extern struct iommu_domain *iommu_domain_alloc(void);
 extern void iommu_domain_free(struct iommu_domain *domain);
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v3 6/6] iommu/core: remove the temporary register_iommu_pgsize API
@ 2011-09-16 17:51   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-09-16 17:51 UTC (permalink / raw)
  To: linux-arm-kernel

Now that all IOMMU drivers are converted to the new
register_iommu_pgsize() API, the old code can be removed, and
we can s/register_iommu_pgsize/register_iommu/.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Joerg Roedel <Joerg.Roedel@amd.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: David Brown <davidb@codeaurora.org>
Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
---
 drivers/iommu/amd_iommu.c   |    3 +--
 drivers/iommu/intel-iommu.c |    3 +--
 drivers/iommu/iommu.c       |   34 +---------------------------------
 drivers/iommu/msm_iommu.c   |    2 +-
 drivers/iommu/omap-iommu.c  |    2 +-
 include/linux/iommu.h       |    5 ++---
 6 files changed, 7 insertions(+), 42 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 80191d2..cf29645 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2511,8 +2511,7 @@ static unsigned long amd_iommu_pgsizes = ~0xFFFUL;
 
 void __init amd_iommu_init_api(void)
 {
-	register_iommu_pgsize(&amd_iommu_ops, &amd_iommu_pgsizes,
-						BITS_PER_LONG);
+	register_iommu(&amd_iommu_ops, &amd_iommu_pgsizes, BITS_PER_LONG);
 }
 
 int __init amd_iommu_init_dma_ops(void)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 46c21a2..c013b85 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3504,8 +3504,7 @@ int __init intel_iommu_init(void)
 
 	init_iommu_pm_ops();
 
-	register_iommu_pgsize(&intel_iommu_ops, &intel_iommu_pgsizes,
-							BITS_PER_LONG);
+	register_iommu(&intel_iommu_ops, &intel_iommu_pgsizes, BITS_PER_LONG);
 
 	bus_register_notifier(&pci_bus_type, &device_nb);
 
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 7c01c8c..8bbd1aa 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -46,13 +46,8 @@ static unsigned int iommu_min_page_idx;
  * @ops: iommu handlers
  * @pgsize_bitmap: bitmap of page sizes supported by the hardware
  * @nr_page_bits: size of @pgsize_bitmap (in bits)
- *
- * Note: this is a temporary function, which will be removed once
- * all IOMMU drivers are converted. The only reason it exists is to
- * allow splitting the pgsizes changes to several patches in order to ease
- * the review.
  */
-void register_iommu_pgsize(struct iommu_ops *ops, unsigned long *pgsize_bitmap,
+void register_iommu(struct iommu_ops *ops, unsigned long *pgsize_bitmap,
 					unsigned int nr_page_bits)
 {
 	if (iommu_ops || iommu_pgsize_bitmap || !nr_page_bits)
@@ -67,33 +62,6 @@ void register_iommu_pgsize(struct iommu_ops *ops, unsigned long *pgsize_bitmap,
 	iommu_min_pagesz = 1 << iommu_min_page_idx;
 }
 
-/*
- * default pagesize bitmap, will be removed once all IOMMU drivers
- * are converted
- */
-static unsigned long default_iommu_pgsizes = ~0xFFFUL;
-
-void register_iommu(struct iommu_ops *ops)
-{
-	if (iommu_ops)
-		BUG();
-
-	iommu_ops = ops;
-
-	/*
-	 * set default pgsize values, which retain the existing
-	 * IOMMU API behavior: drivers will be called to map
-	 * regions that are sized/aligned to order of 4KiB pages
-	 */
-	iommu_pgsize_bitmap = &default_iommu_pgsizes;
-	iommu_nr_page_bits = BITS_PER_LONG;
-
-	/* find the minimum page size and its index only once */
-	iommu_min_page_idx = find_first_bit(iommu_pgsize_bitmap,
-						iommu_nr_page_bits);
-	iommu_min_pagesz = 1 << iommu_min_page_idx;
-}
-
 bool iommu_found(void)
 {
 	return iommu_ops != NULL;
diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index a4ed116..e59ced9 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -733,7 +733,7 @@ static int __init msm_iommu_init(void)
 	setup_iommu_tex_classes();
 
 	/* we're only using the first 25 bits of the pgsizes bitmap */
-	register_iommu_pgsize(&msm_iommu_ops, &msm_iommu_pgsizes, 25);
+	register_iommu(&msm_iommu_ops, &msm_iommu_pgsizes, 25);
 
 	return 0;
 }
diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index 403dd6a..3d8ad87 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -1229,7 +1229,7 @@ static int __init omap_iommu_init(void)
 	iopte_cachep = p;
 
 	/* we're only using the first 25 bits of the pgsizes bitmap */
-	register_iommu_pgsize(&omap_iommu_ops, &omap_iommu_pgsizes, 25);
+	register_iommu(&omap_iommu_ops, &omap_iommu_pgsizes, 25);
 
 	return platform_driver_register(&omap_iommu_driver);
 }
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 1806956..297893f 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -60,9 +60,8 @@ struct iommu_ops {
 
 #ifdef CONFIG_IOMMU_API
 
-extern void register_iommu(struct iommu_ops *ops);
-extern void register_iommu_pgsize(struct iommu_ops *ops,
-		unsigned long *pgsize_bitmap, unsigned int nr_page_bits);
+extern void register_iommu(struct iommu_ops *ops, unsigned long *pgsize_bitmap,
+					unsigned int nr_page_bits);
 extern bool iommu_found(void);
 extern struct iommu_domain *iommu_domain_alloc(void);
 extern void iommu_domain_free(struct iommu_domain *domain);
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 3/6] iommu/msm: announce supported page sizes
  2011-09-16 17:51   ` Ohad Ben-Cohen
@ 2011-09-25  5:03     ` David Brown
  -1 siblings, 0 replies; 106+ messages in thread
From: David Brown @ 2011-09-25  5:03 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: iommu, linux-omap, Hiroshi DOYU, Laurent Pinchart, Joerg Roedel,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Stepan Moskovchenko

On Fri, Sep 16, 2011 at 08:51:43PM +0300, Ohad Ben-Cohen wrote:
> Let the IOMMU core know we support 4KiB, 64KiB, 1MiB and 16MiB page sizes.
> 
> This way the IOMMU core can split any arbitrary-sized physically
> contiguous regions (that it needs to map) as needed.
> 
> Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
> Cc: David Brown <davidb@codeaurora.org>
> Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
> ---
>  drivers/iommu/msm_iommu.c |    8 +++++++-

Acked-by: David Brown <davidb@codeaurora.org>

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v3 3/6] iommu/msm: announce supported page sizes
@ 2011-09-25  5:03     ` David Brown
  0 siblings, 0 replies; 106+ messages in thread
From: David Brown @ 2011-09-25  5:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Sep 16, 2011 at 08:51:43PM +0300, Ohad Ben-Cohen wrote:
> Let the IOMMU core know we support 4KiB, 64KiB, 1MiB and 16MiB page sizes.
> 
> This way the IOMMU core can split any arbitrary-sized physically
> contiguous regions (that it needs to map) as needed.
> 
> Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
> Cc: David Brown <davidb@codeaurora.org>
> Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
> ---
>  drivers/iommu/msm_iommu.c |    8 +++++++-

Acked-by: David Brown <davidb@codeaurora.org>

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 0/6] iommu: split mapping to page sizes as supported by the hardware
  2011-09-16 17:51 ` Ohad Ben-Cohen
@ 2011-09-27  8:56   ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-09-27  8:56 UTC (permalink / raw)
  To: iommu, Joerg Roedel
  Cc: linux-omap, Hiroshi DOYU, Laurent Pinchart, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Ohad Ben-Cohen

ping

On Fri, Sep 16, 2011 at 8:51 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> v2->v3:
>
> - s/KB/KiB/ (David W)
>
> v1->v2:
>
> - split to patches (by keeping the old code around until all drivers are converted) (Joerg)
>
>
> Ohad Ben-Cohen (6):
>  iommu/core: split mapping to page sizes as supported by the hardware
>  iommu/omap: announce supported page sizes
>  iommu/msm: announce supported page sizes
>  iommu/amd: announce supported page sizes
>  iommu/intel: announce supported page sizes
>  iommu/core: remove the temporary register_iommu_pgsize API
>
>  drivers/iommu/amd_iommu.c   |   20 ++++++-
>  drivers/iommu/intel-iommu.c |   20 ++++++-
>  drivers/iommu/iommu.c       |  130 +++++++++++++++++++++++++++++++++++++++----
>  drivers/iommu/msm_iommu.c   |    8 ++-
>  drivers/iommu/omap-iommu.c  |    6 ++-
>  drivers/iommu/omap-iovmm.c  |   12 +---
>  include/linux/iommu.h       |    7 +-
>  virt/kvm/iommu.c            |    4 +-
>  8 files changed, 177 insertions(+), 30 deletions(-)
>
> --
> 1.7.4.1
>
>

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v3 0/6] iommu: split mapping to page sizes as supported by the hardware
@ 2011-09-27  8:56   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-09-27  8:56 UTC (permalink / raw)
  To: linux-arm-kernel

ping

On Fri, Sep 16, 2011 at 8:51 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> v2->v3:
>
> - s/KB/KiB/ (David W)
>
> v1->v2:
>
> - split to patches (by keeping the old code around until all drivers are converted) (Joerg)
>
>
> Ohad Ben-Cohen (6):
> ?iommu/core: split mapping to page sizes as supported by the hardware
> ?iommu/omap: announce supported page sizes
> ?iommu/msm: announce supported page sizes
> ?iommu/amd: announce supported page sizes
> ?iommu/intel: announce supported page sizes
> ?iommu/core: remove the temporary register_iommu_pgsize API
>
> ?drivers/iommu/amd_iommu.c ? | ? 20 ++++++-
> ?drivers/iommu/intel-iommu.c | ? 20 ++++++-
> ?drivers/iommu/iommu.c ? ? ? | ?130 +++++++++++++++++++++++++++++++++++++++----
> ?drivers/iommu/msm_iommu.c ? | ? ?8 ++-
> ?drivers/iommu/omap-iommu.c ?| ? ?6 ++-
> ?drivers/iommu/omap-iovmm.c ?| ? 12 +---
> ?include/linux/iommu.h ? ? ? | ? ?7 +-
> ?virt/kvm/iommu.c ? ? ? ? ? ?| ? ?4 +-
> ?8 files changed, 177 insertions(+), 30 deletions(-)
>
> --
> 1.7.4.1
>
>

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
  2011-09-16 17:51   ` Ohad Ben-Cohen
  (?)
@ 2011-09-27 10:05     ` Roedel, Joerg
  -1 siblings, 0 replies; 106+ messages in thread
From: Roedel, Joerg @ 2011-09-27 10:05 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: iommu, linux-omap, Hiroshi DOYU, Laurent Pinchart,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Stepan Moskovchenko, kvm

On Fri, Sep 16, 2011 at 01:51:41PM -0400, Ohad Ben-Cohen wrote:
>  drivers/iommu/iommu.c      |  158 +++++++++++++++++++++++++++++++++++++++++---
>  drivers/iommu/omap-iovmm.c |   12 +---
>  include/linux/iommu.h      |    6 +-
>  virt/kvm/iommu.c           |    4 +-
>  4 files changed, 157 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index c68ff29..7c01c8c 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -16,6 +16,8 @@
>   * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
>   */
> 
> +#define pr_fmt(fmt)    "%s: " fmt, __func__
> +
>  #include <linux/kernel.h>
>  #include <linux/bug.h>
>  #include <linux/types.h>
> @@ -23,15 +25,73 @@
>  #include <linux/slab.h>
>  #include <linux/errno.h>
>  #include <linux/iommu.h>
> +#include <linux/bitmap.h>
> 
>  static struct iommu_ops *iommu_ops;
> 
> +/* bitmap of supported page sizes */
> +static unsigned long *iommu_pgsize_bitmap;
> +
> +/* number of bits used to represent the supported pages */
> +static unsigned int iommu_nr_page_bits;
> +
> +/* size of the smallest supported page (in bytes) */
> +static unsigned int iommu_min_pagesz;
> +
> +/* bit number of the smallest supported page */
> +static unsigned int iommu_min_page_idx;
> +
> +/**
> + * register_iommu() - register an IOMMU hardware
> + * @ops: iommu handlers
> + * @pgsize_bitmap: bitmap of page sizes supported by the hardware
> + * @nr_page_bits: size of @pgsize_bitmap (in bits)
> + *
> + * Note: this is a temporary function, which will be removed once
> + * all IOMMU drivers are converted. The only reason it exists is to
> + * allow splitting the pgsizes changes to several patches in order to ease
> + * the review.
> + */
> +void register_iommu_pgsize(struct iommu_ops *ops, unsigned long *pgsize_bitmap,
> +                                       unsigned int nr_page_bits)

I think a plain unsigned long value is sufficient to encode the
supported page-sizes. No need to use a full bitmap.

>  int iommu_map(struct iommu_domain *domain, unsigned long iova,
> -             phys_addr_t paddr, int gfp_order, int prot)
> +             phys_addr_t paddr, size_t size, int prot)
>  {
> -       size_t size;
> +       int ret = 0;
> +
> +       /*
> +        * both the virtual address and the physical one, as well as
> +        * the size of the mapping, must be aligned (at least) to the
> +        * size of the smallest page supported by the hardware
> +        */
> +       if (!IS_ALIGNED(iova | paddr | size, iommu_min_pagesz)) {
> +               pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%lx min_pagesz "
> +                       "0x%x\n", iova, (unsigned long)paddr,
> +                       (unsigned long)size, iommu_min_pagesz);
> +               return -EINVAL;
> +       }
> +
> +       pr_debug("map: iova 0x%lx pa 0x%lx size 0x%lx\n", iova,
> +                               (unsigned long)paddr, (unsigned long)size);
> +
> +       while (size) {
> +               unsigned long pgsize = iommu_min_pagesz;
> +               unsigned long idx = iommu_min_page_idx;
> +               unsigned long addr_merge = iova | paddr;
> +               int order;
> +
> +               /* find the max page size with which iova, paddr are aligned */
> +               for (;;) {
> +                       unsigned long try_pgsize;
> +
> +                       idx = find_next_bit(iommu_pgsize_bitmap,
> +                                               iommu_nr_page_bits, idx + 1);
> +
> +                       /* no more pages to check ? */
> +                       if (idx >= iommu_nr_page_bits)
> +                               break;
> +
> +                       try_pgsize = 1 << idx;
> 
> -       size         = 0x1000UL << gfp_order;
> +                       /* page too big ? addresses not aligned ? */
> +                       if (size < try_pgsize ||
> +                                       !IS_ALIGNED(addr_merge, try_pgsize))
> +                               break;
> 
> -       BUG_ON(!IS_ALIGNED(iova | paddr, size));
> +                       pgsize = try_pgsize;
> +               }

With an unsigned long you can use plain and fast bit_ops instead of the
full bitmap functions. 

> 
> -       return iommu_ops->map(domain, iova, paddr, gfp_order, prot);
> +               order = get_order(pgsize);
> +
> +               pr_debug("mapping: iova 0x%lx pa 0x%lx order %d\n", iova,
> +                                       (unsigned long)paddr, order);
> +
> +               ret = iommu_ops->map(domain, iova, paddr, order, prot);
> +               if (ret)
> +                       break;
> +
> +               size -= pgsize;
> +               iova += pgsize;
> +               paddr += pgsize;
> +       }
> +
> +       return ret;
>  }
>  EXPORT_SYMBOL_GPL(iommu_map);
> 

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-09-27 10:05     ` Roedel, Joerg
  0 siblings, 0 replies; 106+ messages in thread
From: Roedel, Joerg @ 2011-09-27 10:05 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: iommu, linux-omap, Hiroshi DOYU, Laurent Pinchart,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Stepan Moskovchenko, kvm

On Fri, Sep 16, 2011 at 01:51:41PM -0400, Ohad Ben-Cohen wrote:
>  drivers/iommu/iommu.c      |  158 +++++++++++++++++++++++++++++++++++++++++---
>  drivers/iommu/omap-iovmm.c |   12 +---
>  include/linux/iommu.h      |    6 +-
>  virt/kvm/iommu.c           |    4 +-
>  4 files changed, 157 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index c68ff29..7c01c8c 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -16,6 +16,8 @@
>   * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
>   */
> 
> +#define pr_fmt(fmt)    "%s: " fmt, __func__
> +
>  #include <linux/kernel.h>
>  #include <linux/bug.h>
>  #include <linux/types.h>
> @@ -23,15 +25,73 @@
>  #include <linux/slab.h>
>  #include <linux/errno.h>
>  #include <linux/iommu.h>
> +#include <linux/bitmap.h>
> 
>  static struct iommu_ops *iommu_ops;
> 
> +/* bitmap of supported page sizes */
> +static unsigned long *iommu_pgsize_bitmap;
> +
> +/* number of bits used to represent the supported pages */
> +static unsigned int iommu_nr_page_bits;
> +
> +/* size of the smallest supported page (in bytes) */
> +static unsigned int iommu_min_pagesz;
> +
> +/* bit number of the smallest supported page */
> +static unsigned int iommu_min_page_idx;
> +
> +/**
> + * register_iommu() - register an IOMMU hardware
> + * @ops: iommu handlers
> + * @pgsize_bitmap: bitmap of page sizes supported by the hardware
> + * @nr_page_bits: size of @pgsize_bitmap (in bits)
> + *
> + * Note: this is a temporary function, which will be removed once
> + * all IOMMU drivers are converted. The only reason it exists is to
> + * allow splitting the pgsizes changes to several patches in order to ease
> + * the review.
> + */
> +void register_iommu_pgsize(struct iommu_ops *ops, unsigned long *pgsize_bitmap,
> +                                       unsigned int nr_page_bits)

I think a plain unsigned long value is sufficient to encode the
supported page-sizes. No need to use a full bitmap.

>  int iommu_map(struct iommu_domain *domain, unsigned long iova,
> -             phys_addr_t paddr, int gfp_order, int prot)
> +             phys_addr_t paddr, size_t size, int prot)
>  {
> -       size_t size;
> +       int ret = 0;
> +
> +       /*
> +        * both the virtual address and the physical one, as well as
> +        * the size of the mapping, must be aligned (at least) to the
> +        * size of the smallest page supported by the hardware
> +        */
> +       if (!IS_ALIGNED(iova | paddr | size, iommu_min_pagesz)) {
> +               pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%lx min_pagesz "
> +                       "0x%x\n", iova, (unsigned long)paddr,
> +                       (unsigned long)size, iommu_min_pagesz);
> +               return -EINVAL;
> +       }
> +
> +       pr_debug("map: iova 0x%lx pa 0x%lx size 0x%lx\n", iova,
> +                               (unsigned long)paddr, (unsigned long)size);
> +
> +       while (size) {
> +               unsigned long pgsize = iommu_min_pagesz;
> +               unsigned long idx = iommu_min_page_idx;
> +               unsigned long addr_merge = iova | paddr;
> +               int order;
> +
> +               /* find the max page size with which iova, paddr are aligned */
> +               for (;;) {
> +                       unsigned long try_pgsize;
> +
> +                       idx = find_next_bit(iommu_pgsize_bitmap,
> +                                               iommu_nr_page_bits, idx + 1);
> +
> +                       /* no more pages to check ? */
> +                       if (idx >= iommu_nr_page_bits)
> +                               break;
> +
> +                       try_pgsize = 1 << idx;
> 
> -       size         = 0x1000UL << gfp_order;
> +                       /* page too big ? addresses not aligned ? */
> +                       if (size < try_pgsize ||
> +                                       !IS_ALIGNED(addr_merge, try_pgsize))
> +                               break;
> 
> -       BUG_ON(!IS_ALIGNED(iova | paddr, size));
> +                       pgsize = try_pgsize;
> +               }

With an unsigned long you can use plain and fast bit_ops instead of the
full bitmap functions. 

> 
> -       return iommu_ops->map(domain, iova, paddr, gfp_order, prot);
> +               order = get_order(pgsize);
> +
> +               pr_debug("mapping: iova 0x%lx pa 0x%lx order %d\n", iova,
> +                                       (unsigned long)paddr, order);
> +
> +               ret = iommu_ops->map(domain, iova, paddr, order, prot);
> +               if (ret)
> +                       break;
> +
> +               size -= pgsize;
> +               iova += pgsize;
> +               paddr += pgsize;
> +       }
> +
> +       return ret;
>  }
>  EXPORT_SYMBOL_GPL(iommu_map);
> 

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-09-27 10:05     ` Roedel, Joerg
  0 siblings, 0 replies; 106+ messages in thread
From: Roedel, Joerg @ 2011-09-27 10:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Sep 16, 2011 at 01:51:41PM -0400, Ohad Ben-Cohen wrote:
>  drivers/iommu/iommu.c      |  158 +++++++++++++++++++++++++++++++++++++++++---
>  drivers/iommu/omap-iovmm.c |   12 +---
>  include/linux/iommu.h      |    6 +-
>  virt/kvm/iommu.c           |    4 +-
>  4 files changed, 157 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index c68ff29..7c01c8c 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -16,6 +16,8 @@
>   * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
>   */
> 
> +#define pr_fmt(fmt)    "%s: " fmt, __func__
> +
>  #include <linux/kernel.h>
>  #include <linux/bug.h>
>  #include <linux/types.h>
> @@ -23,15 +25,73 @@
>  #include <linux/slab.h>
>  #include <linux/errno.h>
>  #include <linux/iommu.h>
> +#include <linux/bitmap.h>
> 
>  static struct iommu_ops *iommu_ops;
> 
> +/* bitmap of supported page sizes */
> +static unsigned long *iommu_pgsize_bitmap;
> +
> +/* number of bits used to represent the supported pages */
> +static unsigned int iommu_nr_page_bits;
> +
> +/* size of the smallest supported page (in bytes) */
> +static unsigned int iommu_min_pagesz;
> +
> +/* bit number of the smallest supported page */
> +static unsigned int iommu_min_page_idx;
> +
> +/**
> + * register_iommu() - register an IOMMU hardware
> + * @ops: iommu handlers
> + * @pgsize_bitmap: bitmap of page sizes supported by the hardware
> + * @nr_page_bits: size of @pgsize_bitmap (in bits)
> + *
> + * Note: this is a temporary function, which will be removed once
> + * all IOMMU drivers are converted. The only reason it exists is to
> + * allow splitting the pgsizes changes to several patches in order to ease
> + * the review.
> + */
> +void register_iommu_pgsize(struct iommu_ops *ops, unsigned long *pgsize_bitmap,
> +                                       unsigned int nr_page_bits)

I think a plain unsigned long value is sufficient to encode the
supported page-sizes. No need to use a full bitmap.

>  int iommu_map(struct iommu_domain *domain, unsigned long iova,
> -             phys_addr_t paddr, int gfp_order, int prot)
> +             phys_addr_t paddr, size_t size, int prot)
>  {
> -       size_t size;
> +       int ret = 0;
> +
> +       /*
> +        * both the virtual address and the physical one, as well as
> +        * the size of the mapping, must be aligned (at least) to the
> +        * size of the smallest page supported by the hardware
> +        */
> +       if (!IS_ALIGNED(iova | paddr | size, iommu_min_pagesz)) {
> +               pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%lx min_pagesz "
> +                       "0x%x\n", iova, (unsigned long)paddr,
> +                       (unsigned long)size, iommu_min_pagesz);
> +               return -EINVAL;
> +       }
> +
> +       pr_debug("map: iova 0x%lx pa 0x%lx size 0x%lx\n", iova,
> +                               (unsigned long)paddr, (unsigned long)size);
> +
> +       while (size) {
> +               unsigned long pgsize = iommu_min_pagesz;
> +               unsigned long idx = iommu_min_page_idx;
> +               unsigned long addr_merge = iova | paddr;
> +               int order;
> +
> +               /* find the max page size with which iova, paddr are aligned */
> +               for (;;) {
> +                       unsigned long try_pgsize;
> +
> +                       idx = find_next_bit(iommu_pgsize_bitmap,
> +                                               iommu_nr_page_bits, idx + 1);
> +
> +                       /* no more pages to check ? */
> +                       if (idx >= iommu_nr_page_bits)
> +                               break;
> +
> +                       try_pgsize = 1 << idx;
> 
> -       size         = 0x1000UL << gfp_order;
> +                       /* page too big ? addresses not aligned ? */
> +                       if (size < try_pgsize ||
> +                                       !IS_ALIGNED(addr_merge, try_pgsize))
> +                               break;
> 
> -       BUG_ON(!IS_ALIGNED(iova | paddr, size));
> +                       pgsize = try_pgsize;
> +               }

With an unsigned long you can use plain and fast bit_ops instead of the
full bitmap functions. 

> 
> -       return iommu_ops->map(domain, iova, paddr, gfp_order, prot);
> +               order = get_order(pgsize);
> +
> +               pr_debug("mapping: iova 0x%lx pa 0x%lx order %d\n", iova,
> +                                       (unsigned long)paddr, order);
> +
> +               ret = iommu_ops->map(domain, iova, paddr, order, prot);
> +               if (ret)
> +                       break;
> +
> +               size -= pgsize;
> +               iova += pgsize;
> +               paddr += pgsize;
> +       }
> +
> +       return ret;
>  }
>  EXPORT_SYMBOL_GPL(iommu_map);
> 

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
  2011-09-27 10:05     ` Roedel, Joerg
  (?)
@ 2011-09-27 12:26       ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-09-27 12:26 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: iommu, linux-omap, Hiroshi DOYU, Laurent Pinchart,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Stepan Moskovchenko, kvm

On Tue, Sep 27, 2011 at 1:05 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> On Fri, Sep 16, 2011 at 01:51:41PM -0400, Ohad Ben-Cohen wrote:
>>  int iommu_map(struct iommu_domain *domain, unsigned long iova,
>> -             phys_addr_t paddr, int gfp_order, int prot)
>> +             phys_addr_t paddr, size_t size, int prot)
>>  {
>> -       size_t size;
>> +       int ret = 0;
>> +
>> +       /*
>> +        * both the virtual address and the physical one, as well as
>> +        * the size of the mapping, must be aligned (at least) to the
>> +        * size of the smallest page supported by the hardware
>> +        */
>> +       if (!IS_ALIGNED(iova | paddr | size, iommu_min_pagesz)) {
>> +               pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%lx min_pagesz "
>> +                       "0x%x\n", iova, (unsigned long)paddr,
>> +                       (unsigned long)size, iommu_min_pagesz);
>> +               return -EINVAL;
>> +       }
>> +
>> +       pr_debug("map: iova 0x%lx pa 0x%lx size 0x%lx\n", iova,
>> +                               (unsigned long)paddr, (unsigned long)size);
>> +
>> +       while (size) {
>> +               unsigned long pgsize = iommu_min_pagesz;
>> +               unsigned long idx = iommu_min_page_idx;
>> +               unsigned long addr_merge = iova | paddr;
>> +               int order;
>> +
>> +               /* find the max page size with which iova, paddr are aligned */
>> +               for (;;) {
>> +                       unsigned long try_pgsize;
>> +
>> +                       idx = find_next_bit(iommu_pgsize_bitmap,
>> +                                               iommu_nr_page_bits, idx + 1);
>> +
>> +                       /* no more pages to check ? */
>> +                       if (idx >= iommu_nr_page_bits)
>> +                               break;
>> +
>> +                       try_pgsize = 1 << idx;
>>
>> -       size         = 0x1000UL << gfp_order;
>> +                       /* page too big ? addresses not aligned ? */
>> +                       if (size < try_pgsize ||
>> +                                       !IS_ALIGNED(addr_merge, try_pgsize))
>> +                               break;
>>
>> -       BUG_ON(!IS_ALIGNED(iova | paddr, size));
>> +                       pgsize = try_pgsize;
>> +               }
>
> With an unsigned long you can use plain and fast bit_ops instead of the
> full bitmap functions.

Not sure I follow; the only bit operation I'm using while mapping is
find_next_bit() (which is a bitops.h method).

What other faster variant are you referring to ?

Thanks,
Ohad.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-09-27 12:26       ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-09-27 12:26 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: iommu, linux-omap, Hiroshi DOYU, Laurent Pinchart,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Stepan Moskovchenko, kvm

On Tue, Sep 27, 2011 at 1:05 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> On Fri, Sep 16, 2011 at 01:51:41PM -0400, Ohad Ben-Cohen wrote:
>>  int iommu_map(struct iommu_domain *domain, unsigned long iova,
>> -             phys_addr_t paddr, int gfp_order, int prot)
>> +             phys_addr_t paddr, size_t size, int prot)
>>  {
>> -       size_t size;
>> +       int ret = 0;
>> +
>> +       /*
>> +        * both the virtual address and the physical one, as well as
>> +        * the size of the mapping, must be aligned (at least) to the
>> +        * size of the smallest page supported by the hardware
>> +        */
>> +       if (!IS_ALIGNED(iova | paddr | size, iommu_min_pagesz)) {
>> +               pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%lx min_pagesz "
>> +                       "0x%x\n", iova, (unsigned long)paddr,
>> +                       (unsigned long)size, iommu_min_pagesz);
>> +               return -EINVAL;
>> +       }
>> +
>> +       pr_debug("map: iova 0x%lx pa 0x%lx size 0x%lx\n", iova,
>> +                               (unsigned long)paddr, (unsigned long)size);
>> +
>> +       while (size) {
>> +               unsigned long pgsize = iommu_min_pagesz;
>> +               unsigned long idx = iommu_min_page_idx;
>> +               unsigned long addr_merge = iova | paddr;
>> +               int order;
>> +
>> +               /* find the max page size with which iova, paddr are aligned */
>> +               for (;;) {
>> +                       unsigned long try_pgsize;
>> +
>> +                       idx = find_next_bit(iommu_pgsize_bitmap,
>> +                                               iommu_nr_page_bits, idx + 1);
>> +
>> +                       /* no more pages to check ? */
>> +                       if (idx >= iommu_nr_page_bits)
>> +                               break;
>> +
>> +                       try_pgsize = 1 << idx;
>>
>> -       size         = 0x1000UL << gfp_order;
>> +                       /* page too big ? addresses not aligned ? */
>> +                       if (size < try_pgsize ||
>> +                                       !IS_ALIGNED(addr_merge, try_pgsize))
>> +                               break;
>>
>> -       BUG_ON(!IS_ALIGNED(iova | paddr, size));
>> +                       pgsize = try_pgsize;
>> +               }
>
> With an unsigned long you can use plain and fast bit_ops instead of the
> full bitmap functions.

Not sure I follow; the only bit operation I'm using while mapping is
find_next_bit() (which is a bitops.h method).

What other faster variant are you referring to ?

Thanks,
Ohad.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-09-27 12:26       ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-09-27 12:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 27, 2011 at 1:05 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> On Fri, Sep 16, 2011 at 01:51:41PM -0400, Ohad Ben-Cohen wrote:
>> ?int iommu_map(struct iommu_domain *domain, unsigned long iova,
>> - ? ? ? ? ? ? phys_addr_t paddr, int gfp_order, int prot)
>> + ? ? ? ? ? ? phys_addr_t paddr, size_t size, int prot)
>> ?{
>> - ? ? ? size_t size;
>> + ? ? ? int ret = 0;
>> +
>> + ? ? ? /*
>> + ? ? ? ?* both the virtual address and the physical one, as well as
>> + ? ? ? ?* the size of the mapping, must be aligned (at least) to the
>> + ? ? ? ?* size of the smallest page supported by the hardware
>> + ? ? ? ?*/
>> + ? ? ? if (!IS_ALIGNED(iova | paddr | size, iommu_min_pagesz)) {
>> + ? ? ? ? ? ? ? pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%lx min_pagesz "
>> + ? ? ? ? ? ? ? ? ? ? ? "0x%x\n", iova, (unsigned long)paddr,
>> + ? ? ? ? ? ? ? ? ? ? ? (unsigned long)size, iommu_min_pagesz);
>> + ? ? ? ? ? ? ? return -EINVAL;
>> + ? ? ? }
>> +
>> + ? ? ? pr_debug("map: iova 0x%lx pa 0x%lx size 0x%lx\n", iova,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (unsigned long)paddr, (unsigned long)size);
>> +
>> + ? ? ? while (size) {
>> + ? ? ? ? ? ? ? unsigned long pgsize = iommu_min_pagesz;
>> + ? ? ? ? ? ? ? unsigned long idx = iommu_min_page_idx;
>> + ? ? ? ? ? ? ? unsigned long addr_merge = iova | paddr;
>> + ? ? ? ? ? ? ? int order;
>> +
>> + ? ? ? ? ? ? ? /* find the max page size with which iova, paddr are aligned */
>> + ? ? ? ? ? ? ? for (;;) {
>> + ? ? ? ? ? ? ? ? ? ? ? unsigned long try_pgsize;
>> +
>> + ? ? ? ? ? ? ? ? ? ? ? idx = find_next_bit(iommu_pgsize_bitmap,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? iommu_nr_page_bits, idx + 1);
>> +
>> + ? ? ? ? ? ? ? ? ? ? ? /* no more pages to check ? */
>> + ? ? ? ? ? ? ? ? ? ? ? if (idx >= iommu_nr_page_bits)
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? break;
>> +
>> + ? ? ? ? ? ? ? ? ? ? ? try_pgsize = 1 << idx;
>>
>> - ? ? ? size ? ? ? ? = 0x1000UL << gfp_order;
>> + ? ? ? ? ? ? ? ? ? ? ? /* page too big ? addresses not aligned ? */
>> + ? ? ? ? ? ? ? ? ? ? ? if (size < try_pgsize ||
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? !IS_ALIGNED(addr_merge, try_pgsize))
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? break;
>>
>> - ? ? ? BUG_ON(!IS_ALIGNED(iova | paddr, size));
>> + ? ? ? ? ? ? ? ? ? ? ? pgsize = try_pgsize;
>> + ? ? ? ? ? ? ? }
>
> With an unsigned long you can use plain and fast bit_ops instead of the
> full bitmap functions.

Not sure I follow; the only bit operation I'm using while mapping is
find_next_bit() (which is a bitops.h method).

What other faster variant are you referring to ?

Thanks,
Ohad.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
  2011-09-27 12:26       ` Ohad Ben-Cohen
  (?)
@ 2011-09-27 13:12         ` Roedel, Joerg
  -1 siblings, 0 replies; 106+ messages in thread
From: Roedel, Joerg @ 2011-09-27 13:12 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: iommu, linux-omap, Hiroshi DOYU, Laurent Pinchart,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Stepan Moskovchenko, kvm

On Tue, Sep 27, 2011 at 08:26:29AM -0400, Ohad Ben-Cohen wrote:
> > With an unsigned long you can use plain and fast bit_ops instead of the
> > full bitmap functions.
> 
> Not sure I follow; the only bit operation I'm using while mapping is
> find_next_bit() (which is a bitops.h method).
> 
> What other faster variant are you referring to ?

You pass a pointer to an unsigned long for the page-size bitmap. This
allows to use an array of unsigned long. But a single unsigned long is
sufficient and you can use functions like ffs() and fls() together with
shifting. These functions often translate to a single intruction in the
binary. The find_next_bit function has much more overhead because it
needs to handle the array-of-ulong case.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-09-27 13:12         ` Roedel, Joerg
  0 siblings, 0 replies; 106+ messages in thread
From: Roedel, Joerg @ 2011-09-27 13:12 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: iommu, linux-omap, Hiroshi DOYU, Laurent Pinchart,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Stepan Moskovchenko, kvm

On Tue, Sep 27, 2011 at 08:26:29AM -0400, Ohad Ben-Cohen wrote:
> > With an unsigned long you can use plain and fast bit_ops instead of the
> > full bitmap functions.
> 
> Not sure I follow; the only bit operation I'm using while mapping is
> find_next_bit() (which is a bitops.h method).
> 
> What other faster variant are you referring to ?

You pass a pointer to an unsigned long for the page-size bitmap. This
allows to use an array of unsigned long. But a single unsigned long is
sufficient and you can use functions like ffs() and fls() together with
shifting. These functions often translate to a single intruction in the
binary. The find_next_bit function has much more overhead because it
needs to handle the array-of-ulong case.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-09-27 13:12         ` Roedel, Joerg
  0 siblings, 0 replies; 106+ messages in thread
From: Roedel, Joerg @ 2011-09-27 13:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 27, 2011 at 08:26:29AM -0400, Ohad Ben-Cohen wrote:
> > With an unsigned long you can use plain and fast bit_ops instead of the
> > full bitmap functions.
> 
> Not sure I follow; the only bit operation I'm using while mapping is
> find_next_bit() (which is a bitops.h method).
> 
> What other faster variant are you referring to ?

You pass a pointer to an unsigned long for the page-size bitmap. This
allows to use an array of unsigned long. But a single unsigned long is
sufficient and you can use functions like ffs() and fls() together with
shifting. These functions often translate to a single intruction in the
binary. The find_next_bit function has much more overhead because it
needs to handle the array-of-ulong case.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
  2011-09-27 13:12         ` Roedel, Joerg
  (?)
@ 2011-09-27 13:28           ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-09-27 13:28 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: iommu, linux-omap, Hiroshi DOYU, Laurent Pinchart,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Stepan Moskovchenko, kvm

On Tue, Sep 27, 2011 at 4:12 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> You pass a pointer to an unsigned long for the page-size bitmap. This
> allows to use an array of unsigned long. But a single unsigned long is
> sufficient

This is fine; I can change that if you like it better (though without
doing the change below this is probably moot).

> and you can use functions like ffs() and fls() together with
> shifting. These functions often translate to a single intruction in the
> binary. The find_next_bit function has much more overhead because it
> needs to handle the array-of-ulong case.

So you're suggesting to re-implement find_next_bit() using ffs()/fls()
and shifting ?

What's the point ?

Sure, if we'll have a proven performance issue while using
find_next_bit() we can think of doing this, but at this stage, this
sounds to me like a premature optimization which isn't too elegant.

At this point I strongly prefer readable, maintainable and easier to
debug code over optimization which isn't proven to be necessary.

Thanks,
Ohad.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-09-27 13:28           ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-09-27 13:28 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: iommu, linux-omap, Hiroshi DOYU, Laurent Pinchart,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Stepan Moskovchenko, kvm

On Tue, Sep 27, 2011 at 4:12 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> You pass a pointer to an unsigned long for the page-size bitmap. This
> allows to use an array of unsigned long. But a single unsigned long is
> sufficient

This is fine; I can change that if you like it better (though without
doing the change below this is probably moot).

> and you can use functions like ffs() and fls() together with
> shifting. These functions often translate to a single intruction in the
> binary. The find_next_bit function has much more overhead because it
> needs to handle the array-of-ulong case.

So you're suggesting to re-implement find_next_bit() using ffs()/fls()
and shifting ?

What's the point ?

Sure, if we'll have a proven performance issue while using
find_next_bit() we can think of doing this, but at this stage, this
sounds to me like a premature optimization which isn't too elegant.

At this point I strongly prefer readable, maintainable and easier to
debug code over optimization which isn't proven to be necessary.

Thanks,
Ohad.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-09-27 13:28           ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-09-27 13:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 27, 2011 at 4:12 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> You pass a pointer to an unsigned long for the page-size bitmap. This
> allows to use an array of unsigned long. But a single unsigned long is
> sufficient

This is fine; I can change that if you like it better (though without
doing the change below this is probably moot).

> and you can use functions like ffs() and fls() together with
> shifting. These functions often translate to a single intruction in the
> binary. The find_next_bit function has much more overhead because it
> needs to handle the array-of-ulong case.

So you're suggesting to re-implement find_next_bit() using ffs()/fls()
and shifting ?

What's the point ?

Sure, if we'll have a proven performance issue while using
find_next_bit() we can think of doing this, but at this stage, this
sounds to me like a premature optimization which isn't too elegant.

At this point I strongly prefer readable, maintainable and easier to
debug code over optimization which isn't proven to be necessary.

Thanks,
Ohad.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
  2011-09-27 13:28           ` Ohad Ben-Cohen
  (?)
@ 2011-09-27 18:14             ` Roedel, Joerg
  -1 siblings, 0 replies; 106+ messages in thread
From: Roedel, Joerg @ 2011-09-27 18:14 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: iommu, linux-omap, Hiroshi DOYU, Laurent Pinchart,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Stepan Moskovchenko, kvm

On Tue, Sep 27, 2011 at 09:28:37AM -0400, Ohad Ben-Cohen wrote:
> So you're suggesting to re-implement find_next_bit() using ffs()/fls()
> and shifting ?

No. I suggest a simpler and shorter algorithm using the bit helpers.
Something like that:

        min_idx = __ffs(iommu_page_sizes);

        while (size) {
                /* Max alignment allowed by current physical address */
                phys_idx = __ffs(phys);

                /* Max alignment allowed by current size */
                size_idx = __fls(size);

                /* special case: iova == 0 */
                if (likely(phys))
                        idx = min(phys_idx, size_idx);
                else
                        idx = size_idx;

                BUG_ON(idx < min_idx);

                psize = 1UL << idx;

                /* search next smaller page-size supported */
                while (psize && !(iommu_page_sizes & psize))
                        psize >>= 1;

                BUG_ON(psize == 0);

		iommu_ops->map(domain, iova, phys, get_order(psize), prot);

                iova += psize;
		phys += psize;
                size -= psize;
        }

It is only C-style pseudo-code, of course. These __ffs and __fls lines
all translate to a single instruction later. The find_next_bit()
function has a lot more overhead because it needs to take account of
real bitmaps (arrays of ulong). But this complexity is not required
here.

And yes, overhead is important when we implement the generic dma-ops
on-top of the iommu-api because this will make the iommu_map function a
fast-path. So we really care about overhead here.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-09-27 18:14             ` Roedel, Joerg
  0 siblings, 0 replies; 106+ messages in thread
From: Roedel, Joerg @ 2011-09-27 18:14 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: iommu, linux-omap, Hiroshi DOYU, Laurent Pinchart,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Stepan Moskovchenko, kvm

On Tue, Sep 27, 2011 at 09:28:37AM -0400, Ohad Ben-Cohen wrote:
> So you're suggesting to re-implement find_next_bit() using ffs()/fls()
> and shifting ?

No. I suggest a simpler and shorter algorithm using the bit helpers.
Something like that:

        min_idx = __ffs(iommu_page_sizes);

        while (size) {
                /* Max alignment allowed by current physical address */
                phys_idx = __ffs(phys);

                /* Max alignment allowed by current size */
                size_idx = __fls(size);

                /* special case: iova == 0 */
                if (likely(phys))
                        idx = min(phys_idx, size_idx);
                else
                        idx = size_idx;

                BUG_ON(idx < min_idx);

                psize = 1UL << idx;

                /* search next smaller page-size supported */
                while (psize && !(iommu_page_sizes & psize))
                        psize >>= 1;

                BUG_ON(psize == 0);

		iommu_ops->map(domain, iova, phys, get_order(psize), prot);

                iova += psize;
		phys += psize;
                size -= psize;
        }

It is only C-style pseudo-code, of course. These __ffs and __fls lines
all translate to a single instruction later. The find_next_bit()
function has a lot more overhead because it needs to take account of
real bitmaps (arrays of ulong). But this complexity is not required
here.

And yes, overhead is important when we implement the generic dma-ops
on-top of the iommu-api because this will make the iommu_map function a
fast-path. So we really care about overhead here.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-09-27 18:14             ` Roedel, Joerg
  0 siblings, 0 replies; 106+ messages in thread
From: Roedel, Joerg @ 2011-09-27 18:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 27, 2011 at 09:28:37AM -0400, Ohad Ben-Cohen wrote:
> So you're suggesting to re-implement find_next_bit() using ffs()/fls()
> and shifting ?

No. I suggest a simpler and shorter algorithm using the bit helpers.
Something like that:

        min_idx = __ffs(iommu_page_sizes);

        while (size) {
                /* Max alignment allowed by current physical address */
                phys_idx = __ffs(phys);

                /* Max alignment allowed by current size */
                size_idx = __fls(size);

                /* special case: iova == 0 */
                if (likely(phys))
                        idx = min(phys_idx, size_idx);
                else
                        idx = size_idx;

                BUG_ON(idx < min_idx);

                psize = 1UL << idx;

                /* search next smaller page-size supported */
                while (psize && !(iommu_page_sizes & psize))
                        psize >>= 1;

                BUG_ON(psize == 0);

		iommu_ops->map(domain, iova, phys, get_order(psize), prot);

                iova += psize;
		phys += psize;
                size -= psize;
        }

It is only C-style pseudo-code, of course. These __ffs and __fls lines
all translate to a single instruction later. The find_next_bit()
function has a lot more overhead because it needs to take account of
real bitmaps (arrays of ulong). But this complexity is not required
here.

And yes, overhead is important when we implement the generic dma-ops
on-top of the iommu-api because this will make the iommu_map function a
fast-path. So we really care about overhead here.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
  2011-09-27 18:14             ` Roedel, Joerg
  (?)
@ 2011-10-02 15:58               ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-02 15:58 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: iommu, linux-omap, Hiroshi DOYU, Laurent Pinchart,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Stepan Moskovchenko, kvm

(sorry for the late response; we had a big holiday here and I was
forced away from my keyboard :)

On Tue, Sep 27, 2011 at 9:14 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> No. I suggest a simpler and shorter algorithm using the bit helpers.

Ok, fair enough. I've revised the patches and attached the main one
below; please tell me if it looks ok, and then I'll resubmit the
entire patch set.

The main changes I've done from your pseudo-code are:

1. removed the internal while loop, and used several bits
manipulations to replace it
2. removed the call to get_order(), and instead used the bytes order
we already have to deduct it
3. considered iova alignment requirements too
4. removed the BUG_ON()s, since we're checking for minimal
size/alignment conditions in the beginning of the map function

> And yes, overhead is important when we implement the generic dma-ops
> on-top of the iommu-api because this will make the iommu_map function a
> fast-path. So we really care about overhead here.

Don't get me wrong; I don't underestimate the importance of performance here.

I just think that as long as some code is reasonably efficient and not
badly designed (so it's not too hard to optimize later if needed),
then it's usually better to postpone optimizations until we benchmark
and know for certain that the effort in squeezing away several cpu
cycles is noticeable.

In this case, I'm not entirely convinced that it is, because every map
call anyway ends up flushing the caches, which might take hundreds of
cpu cycles. In addition, at least the ARM version of find_next_bit
doesn't look terribly bad.

YMMV though: I didn't find an optimized assembly implementation of
find_next_bit for x86, and the generic one does look complex. In
addition, the variable page size behavior of the x86 iommu drivers may
have caused much more iterations than any of the ARM drivers would
have had, so the penalty of running that generic find_next_bit
repeatedly could have accumulated into something noticeable.

Anyway, here comes the revised patch.

Thanks for your review!
Ohad.

>From 042690394b1a121202e0eeeb47d267b1a8d65132 Mon Sep 17 00:00:00 2001
From: Ohad Ben-Cohen <ohad@wizery.com>
Date: Wed, 31 Aug 2011 16:46:47 +0300
Subject: [PATCH 2/7] iommu/core: split mapping to page sizes as
supported by the hardware

When mapping a memory region, split it to page sizes as supported
by the iommu hardware. Always prefer bigger pages, when possible,
in order to reduce the TLB pressure.

The logic to do that is now added to the IOMMU core, so neither the iommu
drivers themselves nor users of the IOMMU API have to duplicate it.

This allows a more lenient granularity of mappings; traditionally the
IOMMU API took 'order' (of a page) as a mapping size, and directly let
the low level iommu drivers handle the mapping, but now that the IOMMU
core can split arbitrary memory regions into pages, we can remove this
limitation, so users don't have to split those regions by themselves.

Currently the supported page sizes are advertised once and they then
remain static. That works well for OMAP and MSM but it would probably
not fly well with intel's hardware, where the page size capabilities
seem to have the potential to be different between several DMA
remapping devices.

As requested, register_iommu() isn't changed yet, so we can convert
the IOMMU drivers in subsequent patches, and after all the drivers
are converted, register_iommu will be changed (and the temporary
register_iommu_pgsize() will be removed).

Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted
to send the mapping size in bytes instead of in page order.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: David Brown <davidb@codeaurora.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <Joerg.Roedel@amd.com>
Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
Cc: Hiroshi DOYU <Hiroshi.DOYU@nokia.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: kvm@vger.kernel.org
---
 drivers/iommu/iommu.c      |  138 ++++++++++++++++++++++++++++++++++++++++---
 drivers/iommu/omap-iovmm.c |   12 +---
 include/linux/iommu.h      |    6 +-
 virt/kvm/iommu.c           |    4 +-
 4 files changed, 137 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index a7b0862..f23563f 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -16,6 +16,8 @@
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
  */

+#define pr_fmt(fmt)    "%s: " fmt, __func__
+
 #include <linux/kernel.h>
 #include <linux/bug.h>
 #include <linux/types.h>
@@ -23,15 +25,54 @@
 #include <linux/slab.h>
 #include <linux/errno.h>
 #include <linux/iommu.h>
+#include <linux/bitmap.h>

 static struct iommu_ops *iommu_ops;

+/* bitmap of supported page sizes */
+static unsigned long iommu_pgsize_bitmap;
+
+/* size of the smallest supported page (in bytes) */
+static unsigned int iommu_min_pagesz;
+
+/**
+ * register_iommu() - register an IOMMU hardware
+ * @ops: iommu handlers
+ * @pgsize_bitmap: bitmap of page sizes supported by the hardware
+ *
+ * Note: this is a temporary function, which will be removed once
+ * all IOMMU drivers are converted. The only reason it exists is to
+ * allow splitting the pgsizes changes to several patches in order to ease
+ * the review.
+ */
+void register_iommu_pgsize(struct iommu_ops *ops, unsigned long pgsize_bitmap)
+{
+	if (iommu_ops || iommu_pgsize_bitmap || !pgsize_bitmap)
+		BUG();
+
+	iommu_ops = ops;
+	iommu_pgsize_bitmap = pgsize_bitmap;
+
+	/* find out the minimum page size only once */
+	iommu_min_pagesz = 1 << __ffs(pgsize_bitmap);
+}
+
 void register_iommu(struct iommu_ops *ops)
 {
 	if (iommu_ops)
 		BUG();

 	iommu_ops = ops;
+
+	/*
+	 * set default pgsize values, which retain the existing
+	 * IOMMU API behavior: drivers will be called to map
+	 * regions that are sized/aligned to order of 4KiB pages
+	 */
+	iommu_pgsize_bitmap = ~0xFFFUL;
+
+	/* find out the minimum page size only once */
+	iommu_min_pagesz = 1 << __ffs(iommu_pgsize_bitmap);
 }

 bool iommu_found(void)
@@ -115,26 +156,103 @@ int iommu_domain_has_cap(struct iommu_domain *domain,
 EXPORT_SYMBOL_GPL(iommu_domain_has_cap);

 int iommu_map(struct iommu_domain *domain, unsigned long iova,
-	      phys_addr_t paddr, int gfp_order, int prot)
+	      phys_addr_t paddr, size_t size, int prot)
 {
-	size_t size;
+	int ret = 0;
+
+	/*
+	 * both the virtual address and the physical one, as well as
+	 * the size of the mapping, must be aligned (at least) to the
+	 * size of the smallest page supported by the hardware
+	 */
+	if (!IS_ALIGNED(iova | paddr | size, iommu_min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%lx min_pagesz "
+			"0x%x\n", iova, (unsigned long)paddr,
+			(unsigned long)size, iommu_min_pagesz);
+		return -EINVAL;
+	}

-	size         = 0x1000UL << gfp_order;
+	pr_debug("map: iova 0x%lx pa 0x%lx size 0x%lx\n", iova,
+				(unsigned long)paddr, (unsigned long)size);

-	BUG_ON(!IS_ALIGNED(iova | paddr, size));
+	while (size) {
+		unsigned long pgsize, addr_merge = iova | paddr;
+		unsigned int pgsize_idx;

-	return iommu_ops->map(domain, iova, paddr, gfp_order, prot);
+		/* Max page size that still fits into 'size' */
+		pgsize_idx = __fls(size);
+
+		/* need to consider alignment requirements ? */
+		if (likely(addr_merge)) {
+			/* Max page size allowed by both iova and paddr */
+			unsigned int align_pgsize_idx = __ffs(addr_merge);
+
+			pgsize_idx = min(pgsize_idx, align_pgsize_idx);
+		}
+
+		/* build a mask of acceptable page sizes */
+		pgsize = (1UL << (pgsize_idx + 1)) - 1;
+
+		/* throw away page sizes not supported by the hardware */
+		pgsize &= iommu_pgsize_bitmap;
+
+		/* pick the biggest page */
+		pgsize_idx = __fls(pgsize);
+		pgsize = 1UL << pgsize_idx;
+
+		/* convert index to page order */
+		pgsize_idx -= PAGE_SHIFT;
+
+		pr_debug("mapping: iova 0x%lx pa 0x%lx order %u\n", iova,
+					(unsigned long)paddr, pgsize_idx);
+
+		ret = iommu_ops->map(domain, iova, paddr, pgsize_idx, prot);
+		if (ret)
+			break;
+
+		iova += pgsize;
+		paddr += pgsize;
+		size -= pgsize;
+	}
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_map);

-int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
+int iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
 {
-	size_t size;
+	int order, unmapped_size, unmapped_order, total_unmapped = 0;
+
+	/*
+	 * The virtual address, as well as the size of the mapping, must be
+	 * aligned (at least) to the size of the smallest page supported
+	 * by the hardware
+	 */
+	if (!IS_ALIGNED(iova | size, iommu_min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx size 0x%lx min_pagesz 0x%x\n",
+				iova, (unsigned long)size, iommu_min_pagesz);
+		return -EINVAL;
+	}
+
+	pr_debug("unmap this: iova 0x%lx size 0x%lx\n", iova,
+							(unsigned long)size);
+
+	while (size > total_unmapped) {
+		order = get_order(size - total_unmapped);
+
+		unmapped_order = iommu_ops->unmap(domain, iova, order);
+		if (unmapped_order < 0)
+			return unmapped_order;
+
+		pr_debug("unmapped: iova 0x%lx order %d\n", iova,
+							unmapped_order);

-	size         = 0x1000UL << gfp_order;
+		unmapped_size = 0x1000UL << unmapped_order;

-	BUG_ON(!IS_ALIGNED(iova, size));
+		iova += unmapped_size;
+		total_unmapped += unmapped_size;
+	}

-	return iommu_ops->unmap(domain, iova, gfp_order);
+	return get_order(total_unmapped);
 }
 EXPORT_SYMBOL_GPL(iommu_unmap);
diff --git a/drivers/iommu/omap-iovmm.c b/drivers/iommu/omap-iovmm.c
index e8fdb88..f4dea5a 100644
--- a/drivers/iommu/omap-iovmm.c
+++ b/drivers/iommu/omap-iovmm.c
@@ -409,7 +409,6 @@ static int map_iovm_area(struct iommu_domain
*domain, struct iovm_struct *new,
 	unsigned int i, j;
 	struct scatterlist *sg;
 	u32 da = new->da_start;
-	int order;

 	if (!domain || !sgt)
 		return -EINVAL;
@@ -428,12 +427,10 @@ static int map_iovm_area(struct iommu_domain
*domain, struct iovm_struct *new,
 		if (bytes_to_iopgsz(bytes) < 0)
 			goto err_out;

-		order = get_order(bytes);
-
 		pr_debug("%s: [%d] %08x %08x(%x)\n", __func__,
 			 i, da, pa, bytes);

-		err = iommu_map(domain, da, pa, order, flags);
+		err = iommu_map(domain, da, pa, bytes, flags);
 		if (err)
 			goto err_out;

@@ -448,10 +445,9 @@ err_out:
 		size_t bytes;

 		bytes = sg->length + sg->offset;
-		order = get_order(bytes);

 		/* ignore failures.. we're already handling one */
-		iommu_unmap(domain, da, order);
+		iommu_unmap(domain, da, bytes);

 		da += bytes;
 	}
@@ -474,12 +470,10 @@ static void unmap_iovm_area(struct iommu_domain
*domain, struct omap_iommu *obj,
 	start = area->da_start;
 	for_each_sg(sgt->sgl, sg, sgt->nents, i) {
 		size_t bytes;
-		int order;

 		bytes = sg->length + sg->offset;
-		order = get_order(bytes);

-		err = iommu_unmap(domain, start, order);
+		err = iommu_unmap(domain, start, bytes);
 		if (err < 0)
 			break;

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 2987199..3b34938 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -61,6 +61,8 @@ struct iommu_ops {
 #ifdef CONFIG_IOMMU_API

 extern void register_iommu(struct iommu_ops *ops);
+extern void register_iommu_pgsize(struct iommu_ops *ops,
+				unsigned long pgsize_bitmap);
 extern bool iommu_found(void);
 extern struct iommu_domain *iommu_domain_alloc(void);
 extern void iommu_domain_free(struct iommu_domain *domain);
@@ -69,9 +71,9 @@ extern int iommu_attach_device(struct iommu_domain *domain,
 extern void iommu_detach_device(struct iommu_domain *domain,
 				struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
-		     phys_addr_t paddr, int gfp_order, int prot);
+		     phys_addr_t paddr, size_t size, int prot);
 extern int iommu_unmap(struct iommu_domain *domain, unsigned long iova,
-		       int gfp_order);
+		       size_t size);
 extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
 				      unsigned long iova);
 extern int iommu_domain_has_cap(struct iommu_domain *domain,
diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index 78c80f6..ea142d3 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -111,7 +111,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct
kvm_memory_slot *slot)

 		/* Map into IO address space */
 		r = iommu_map(domain, gfn_to_gpa(gfn), pfn_to_hpa(pfn),
-			      get_order(page_size), flags);
+			      page_size, flags);
 		if (r) {
 			printk(KERN_ERR "kvm_iommu_map_address:"
 			       "iommu failed to map pfn=%llx\n", pfn);
@@ -293,7 +293,7 @@ static void kvm_iommu_put_pages(struct kvm *kvm,
 		pfn  = phys >> PAGE_SHIFT;

 		/* Unmap address from IO address space */
-		order       = iommu_unmap(domain, gfn_to_gpa(gfn), 0);
+		order       = iommu_unmap(domain, gfn_to_gpa(gfn), PAGE_SIZE);
 		unmap_pages = 1ULL << order;

 		/* Unpin all pages we just unmapped to not leak any memory */
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-02 15:58               ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-02 15:58 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: kvm, Arnd Bergmann, Hiroshi DOYU, linux-kernel, iommu,
	Laurent Pinchart, David Brown, linux-omap, David Woodhouse,
	linux-arm-kernel, Stepan Moskovchenko

(sorry for the late response; we had a big holiday here and I was
forced away from my keyboard :)

On Tue, Sep 27, 2011 at 9:14 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> No. I suggest a simpler and shorter algorithm using the bit helpers.

Ok, fair enough. I've revised the patches and attached the main one
below; please tell me if it looks ok, and then I'll resubmit the
entire patch set.

The main changes I've done from your pseudo-code are:

1. removed the internal while loop, and used several bits
manipulations to replace it
2. removed the call to get_order(), and instead used the bytes order
we already have to deduct it
3. considered iova alignment requirements too
4. removed the BUG_ON()s, since we're checking for minimal
size/alignment conditions in the beginning of the map function

> And yes, overhead is important when we implement the generic dma-ops
> on-top of the iommu-api because this will make the iommu_map function a
> fast-path. So we really care about overhead here.

Don't get me wrong; I don't underestimate the importance of performance here.

I just think that as long as some code is reasonably efficient and not
badly designed (so it's not too hard to optimize later if needed),
then it's usually better to postpone optimizations until we benchmark
and know for certain that the effort in squeezing away several cpu
cycles is noticeable.

In this case, I'm not entirely convinced that it is, because every map
call anyway ends up flushing the caches, which might take hundreds of
cpu cycles. In addition, at least the ARM version of find_next_bit
doesn't look terribly bad.

YMMV though: I didn't find an optimized assembly implementation of
find_next_bit for x86, and the generic one does look complex. In
addition, the variable page size behavior of the x86 iommu drivers may
have caused much more iterations than any of the ARM drivers would
have had, so the penalty of running that generic find_next_bit
repeatedly could have accumulated into something noticeable.

Anyway, here comes the revised patch.

Thanks for your review!
Ohad.

>From 042690394b1a121202e0eeeb47d267b1a8d65132 Mon Sep 17 00:00:00 2001
From: Ohad Ben-Cohen <ohad@wizery.com>
Date: Wed, 31 Aug 2011 16:46:47 +0300
Subject: [PATCH 2/7] iommu/core: split mapping to page sizes as
supported by the hardware

When mapping a memory region, split it to page sizes as supported
by the iommu hardware. Always prefer bigger pages, when possible,
in order to reduce the TLB pressure.

The logic to do that is now added to the IOMMU core, so neither the iommu
drivers themselves nor users of the IOMMU API have to duplicate it.

This allows a more lenient granularity of mappings; traditionally the
IOMMU API took 'order' (of a page) as a mapping size, and directly let
the low level iommu drivers handle the mapping, but now that the IOMMU
core can split arbitrary memory regions into pages, we can remove this
limitation, so users don't have to split those regions by themselves.

Currently the supported page sizes are advertised once and they then
remain static. That works well for OMAP and MSM but it would probably
not fly well with intel's hardware, where the page size capabilities
seem to have the potential to be different between several DMA
remapping devices.

As requested, register_iommu() isn't changed yet, so we can convert
the IOMMU drivers in subsequent patches, and after all the drivers
are converted, register_iommu will be changed (and the temporary
register_iommu_pgsize() will be removed).

Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted
to send the mapping size in bytes instead of in page order.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: David Brown <davidb@codeaurora.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <Joerg.Roedel@amd.com>
Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
Cc: Hiroshi DOYU <Hiroshi.DOYU@nokia.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: kvm@vger.kernel.org
---
 drivers/iommu/iommu.c      |  138 ++++++++++++++++++++++++++++++++++++++++---
 drivers/iommu/omap-iovmm.c |   12 +---
 include/linux/iommu.h      |    6 +-
 virt/kvm/iommu.c           |    4 +-
 4 files changed, 137 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index a7b0862..f23563f 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -16,6 +16,8 @@
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
  */

+#define pr_fmt(fmt)    "%s: " fmt, __func__
+
 #include <linux/kernel.h>
 #include <linux/bug.h>
 #include <linux/types.h>
@@ -23,15 +25,54 @@
 #include <linux/slab.h>
 #include <linux/errno.h>
 #include <linux/iommu.h>
+#include <linux/bitmap.h>

 static struct iommu_ops *iommu_ops;

+/* bitmap of supported page sizes */
+static unsigned long iommu_pgsize_bitmap;
+
+/* size of the smallest supported page (in bytes) */
+static unsigned int iommu_min_pagesz;
+
+/**
+ * register_iommu() - register an IOMMU hardware
+ * @ops: iommu handlers
+ * @pgsize_bitmap: bitmap of page sizes supported by the hardware
+ *
+ * Note: this is a temporary function, which will be removed once
+ * all IOMMU drivers are converted. The only reason it exists is to
+ * allow splitting the pgsizes changes to several patches in order to ease
+ * the review.
+ */
+void register_iommu_pgsize(struct iommu_ops *ops, unsigned long pgsize_bitmap)
+{
+	if (iommu_ops || iommu_pgsize_bitmap || !pgsize_bitmap)
+		BUG();
+
+	iommu_ops = ops;
+	iommu_pgsize_bitmap = pgsize_bitmap;
+
+	/* find out the minimum page size only once */
+	iommu_min_pagesz = 1 << __ffs(pgsize_bitmap);
+}
+
 void register_iommu(struct iommu_ops *ops)
 {
 	if (iommu_ops)
 		BUG();

 	iommu_ops = ops;
+
+	/*
+	 * set default pgsize values, which retain the existing
+	 * IOMMU API behavior: drivers will be called to map
+	 * regions that are sized/aligned to order of 4KiB pages
+	 */
+	iommu_pgsize_bitmap = ~0xFFFUL;
+
+	/* find out the minimum page size only once */
+	iommu_min_pagesz = 1 << __ffs(iommu_pgsize_bitmap);
 }

 bool iommu_found(void)
@@ -115,26 +156,103 @@ int iommu_domain_has_cap(struct iommu_domain *domain,
 EXPORT_SYMBOL_GPL(iommu_domain_has_cap);

 int iommu_map(struct iommu_domain *domain, unsigned long iova,
-	      phys_addr_t paddr, int gfp_order, int prot)
+	      phys_addr_t paddr, size_t size, int prot)
 {
-	size_t size;
+	int ret = 0;
+
+	/*
+	 * both the virtual address and the physical one, as well as
+	 * the size of the mapping, must be aligned (at least) to the
+	 * size of the smallest page supported by the hardware
+	 */
+	if (!IS_ALIGNED(iova | paddr | size, iommu_min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%lx min_pagesz "
+			"0x%x\n", iova, (unsigned long)paddr,
+			(unsigned long)size, iommu_min_pagesz);
+		return -EINVAL;
+	}

-	size         = 0x1000UL << gfp_order;
+	pr_debug("map: iova 0x%lx pa 0x%lx size 0x%lx\n", iova,
+				(unsigned long)paddr, (unsigned long)size);

-	BUG_ON(!IS_ALIGNED(iova | paddr, size));
+	while (size) {
+		unsigned long pgsize, addr_merge = iova | paddr;
+		unsigned int pgsize_idx;

-	return iommu_ops->map(domain, iova, paddr, gfp_order, prot);
+		/* Max page size that still fits into 'size' */
+		pgsize_idx = __fls(size);
+
+		/* need to consider alignment requirements ? */
+		if (likely(addr_merge)) {
+			/* Max page size allowed by both iova and paddr */
+			unsigned int align_pgsize_idx = __ffs(addr_merge);
+
+			pgsize_idx = min(pgsize_idx, align_pgsize_idx);
+		}
+
+		/* build a mask of acceptable page sizes */
+		pgsize = (1UL << (pgsize_idx + 1)) - 1;
+
+		/* throw away page sizes not supported by the hardware */
+		pgsize &= iommu_pgsize_bitmap;
+
+		/* pick the biggest page */
+		pgsize_idx = __fls(pgsize);
+		pgsize = 1UL << pgsize_idx;
+
+		/* convert index to page order */
+		pgsize_idx -= PAGE_SHIFT;
+
+		pr_debug("mapping: iova 0x%lx pa 0x%lx order %u\n", iova,
+					(unsigned long)paddr, pgsize_idx);
+
+		ret = iommu_ops->map(domain, iova, paddr, pgsize_idx, prot);
+		if (ret)
+			break;
+
+		iova += pgsize;
+		paddr += pgsize;
+		size -= pgsize;
+	}
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_map);

-int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
+int iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
 {
-	size_t size;
+	int order, unmapped_size, unmapped_order, total_unmapped = 0;
+
+	/*
+	 * The virtual address, as well as the size of the mapping, must be
+	 * aligned (at least) to the size of the smallest page supported
+	 * by the hardware
+	 */
+	if (!IS_ALIGNED(iova | size, iommu_min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx size 0x%lx min_pagesz 0x%x\n",
+				iova, (unsigned long)size, iommu_min_pagesz);
+		return -EINVAL;
+	}
+
+	pr_debug("unmap this: iova 0x%lx size 0x%lx\n", iova,
+							(unsigned long)size);
+
+	while (size > total_unmapped) {
+		order = get_order(size - total_unmapped);
+
+		unmapped_order = iommu_ops->unmap(domain, iova, order);
+		if (unmapped_order < 0)
+			return unmapped_order;
+
+		pr_debug("unmapped: iova 0x%lx order %d\n", iova,
+							unmapped_order);

-	size         = 0x1000UL << gfp_order;
+		unmapped_size = 0x1000UL << unmapped_order;

-	BUG_ON(!IS_ALIGNED(iova, size));
+		iova += unmapped_size;
+		total_unmapped += unmapped_size;
+	}

-	return iommu_ops->unmap(domain, iova, gfp_order);
+	return get_order(total_unmapped);
 }
 EXPORT_SYMBOL_GPL(iommu_unmap);
diff --git a/drivers/iommu/omap-iovmm.c b/drivers/iommu/omap-iovmm.c
index e8fdb88..f4dea5a 100644
--- a/drivers/iommu/omap-iovmm.c
+++ b/drivers/iommu/omap-iovmm.c
@@ -409,7 +409,6 @@ static int map_iovm_area(struct iommu_domain
*domain, struct iovm_struct *new,
 	unsigned int i, j;
 	struct scatterlist *sg;
 	u32 da = new->da_start;
-	int order;

 	if (!domain || !sgt)
 		return -EINVAL;
@@ -428,12 +427,10 @@ static int map_iovm_area(struct iommu_domain
*domain, struct iovm_struct *new,
 		if (bytes_to_iopgsz(bytes) < 0)
 			goto err_out;

-		order = get_order(bytes);
-
 		pr_debug("%s: [%d] %08x %08x(%x)\n", __func__,
 			 i, da, pa, bytes);

-		err = iommu_map(domain, da, pa, order, flags);
+		err = iommu_map(domain, da, pa, bytes, flags);
 		if (err)
 			goto err_out;

@@ -448,10 +445,9 @@ err_out:
 		size_t bytes;

 		bytes = sg->length + sg->offset;
-		order = get_order(bytes);

 		/* ignore failures.. we're already handling one */
-		iommu_unmap(domain, da, order);
+		iommu_unmap(domain, da, bytes);

 		da += bytes;
 	}
@@ -474,12 +470,10 @@ static void unmap_iovm_area(struct iommu_domain
*domain, struct omap_iommu *obj,
 	start = area->da_start;
 	for_each_sg(sgt->sgl, sg, sgt->nents, i) {
 		size_t bytes;
-		int order;

 		bytes = sg->length + sg->offset;
-		order = get_order(bytes);

-		err = iommu_unmap(domain, start, order);
+		err = iommu_unmap(domain, start, bytes);
 		if (err < 0)
 			break;

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 2987199..3b34938 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -61,6 +61,8 @@ struct iommu_ops {
 #ifdef CONFIG_IOMMU_API

 extern void register_iommu(struct iommu_ops *ops);
+extern void register_iommu_pgsize(struct iommu_ops *ops,
+				unsigned long pgsize_bitmap);
 extern bool iommu_found(void);
 extern struct iommu_domain *iommu_domain_alloc(void);
 extern void iommu_domain_free(struct iommu_domain *domain);
@@ -69,9 +71,9 @@ extern int iommu_attach_device(struct iommu_domain *domain,
 extern void iommu_detach_device(struct iommu_domain *domain,
 				struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
-		     phys_addr_t paddr, int gfp_order, int prot);
+		     phys_addr_t paddr, size_t size, int prot);
 extern int iommu_unmap(struct iommu_domain *domain, unsigned long iova,
-		       int gfp_order);
+		       size_t size);
 extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
 				      unsigned long iova);
 extern int iommu_domain_has_cap(struct iommu_domain *domain,
diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index 78c80f6..ea142d3 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -111,7 +111,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct
kvm_memory_slot *slot)

 		/* Map into IO address space */
 		r = iommu_map(domain, gfn_to_gpa(gfn), pfn_to_hpa(pfn),
-			      get_order(page_size), flags);
+			      page_size, flags);
 		if (r) {
 			printk(KERN_ERR "kvm_iommu_map_address:"
 			       "iommu failed to map pfn=%llx\n", pfn);
@@ -293,7 +293,7 @@ static void kvm_iommu_put_pages(struct kvm *kvm,
 		pfn  = phys >> PAGE_SHIFT;

 		/* Unmap address from IO address space */
-		order       = iommu_unmap(domain, gfn_to_gpa(gfn), 0);
+		order       = iommu_unmap(domain, gfn_to_gpa(gfn), PAGE_SIZE);
 		unmap_pages = 1ULL << order;

 		/* Unpin all pages we just unmapped to not leak any memory */
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-02 15:58               ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-02 15:58 UTC (permalink / raw)
  To: linux-arm-kernel

(sorry for the late response; we had a big holiday here and I was
forced away from my keyboard :)

On Tue, Sep 27, 2011 at 9:14 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> No. I suggest a simpler and shorter algorithm using the bit helpers.

Ok, fair enough. I've revised the patches and attached the main one
below; please tell me if it looks ok, and then I'll resubmit the
entire patch set.

The main changes I've done from your pseudo-code are:

1. removed the internal while loop, and used several bits
manipulations to replace it
2. removed the call to get_order(), and instead used the bytes order
we already have to deduct it
3. considered iova alignment requirements too
4. removed the BUG_ON()s, since we're checking for minimal
size/alignment conditions in the beginning of the map function

> And yes, overhead is important when we implement the generic dma-ops
> on-top of the iommu-api because this will make the iommu_map function a
> fast-path. So we really care about overhead here.

Don't get me wrong; I don't underestimate the importance of performance here.

I just think that as long as some code is reasonably efficient and not
badly designed (so it's not too hard to optimize later if needed),
then it's usually better to postpone optimizations until we benchmark
and know for certain that the effort in squeezing away several cpu
cycles is noticeable.

In this case, I'm not entirely convinced that it is, because every map
call anyway ends up flushing the caches, which might take hundreds of
cpu cycles. In addition, at least the ARM version of find_next_bit
doesn't look terribly bad.

YMMV though: I didn't find an optimized assembly implementation of
find_next_bit for x86, and the generic one does look complex. In
addition, the variable page size behavior of the x86 iommu drivers may
have caused much more iterations than any of the ARM drivers would
have had, so the penalty of running that generic find_next_bit
repeatedly could have accumulated into something noticeable.

Anyway, here comes the revised patch.

Thanks for your review!
Ohad.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
  2011-10-02 15:58               ` Ohad Ben-Cohen
  (?)
@ 2011-10-10  7:40                 ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-10  7:40 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: iommu, linux-omap, Hiroshi DOYU, Laurent Pinchart,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Stepan Moskovchenko, kvm

On Sun, Oct 2, 2011 at 5:58 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> Ok, fair enough. I've revised the patches and attached the main one
> below; please tell me if it looks ok, and then I'll resubmit the
> entire patch set.

Ping ?

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-10  7:40                 ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-10  7:40 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: kvm, Arnd Bergmann, Hiroshi DOYU, linux-kernel, iommu,
	Laurent Pinchart, David Brown, linux-omap, David Woodhouse,
	linux-arm-kernel, Stepan Moskovchenko

On Sun, Oct 2, 2011 at 5:58 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> Ok, fair enough. I've revised the patches and attached the main one
> below; please tell me if it looks ok, and then I'll resubmit the
> entire patch set.

Ping ?

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-10  7:40                 ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-10  7:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, Oct 2, 2011 at 5:58 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> Ok, fair enough. I've revised the patches and attached the main one
> below; please tell me if it looks ok, and then I'll resubmit the
> entire patch set.

Ping ?

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
  2011-10-02 15:58               ` Ohad Ben-Cohen
  (?)
@ 2011-10-10  9:47                 ` Roedel, Joerg
  -1 siblings, 0 replies; 106+ messages in thread
From: Roedel, Joerg @ 2011-10-10  9:47 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: iommu, linux-omap, Hiroshi DOYU, Laurent Pinchart,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Stepan Moskovchenko, kvm

Hi Ohad,

sorry, I was on vacation last week and had no time to look into this.

On Sun, Oct 02, 2011 at 11:58:12AM -0400, Ohad Ben-Cohen wrote:
>  drivers/iommu/iommu.c      |  138 ++++++++++++++++++++++++++++++++++++++++---
>  drivers/iommu/omap-iovmm.c |   12 +---
>  include/linux/iommu.h      |    6 +-
>  virt/kvm/iommu.c           |    4 +-
>  4 files changed, 137 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index a7b0862..f23563f 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -16,6 +16,8 @@
>   * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
>   */
> 
> +#define pr_fmt(fmt)    "%s: " fmt, __func__
> +
>  #include <linux/kernel.h>
>  #include <linux/bug.h>
>  #include <linux/types.h>
> @@ -23,15 +25,54 @@
>  #include <linux/slab.h>
>  #include <linux/errno.h>
>  #include <linux/iommu.h>
> +#include <linux/bitmap.h>

Is this still required?

> 
>  static struct iommu_ops *iommu_ops;
> 
> +/* bitmap of supported page sizes */
> +static unsigned long iommu_pgsize_bitmap;
> +
> +/* size of the smallest supported page (in bytes) */
> +static unsigned int iommu_min_pagesz;
> +
> +/**
> + * register_iommu() - register an IOMMU hardware
> + * @ops: iommu handlers
> + * @pgsize_bitmap: bitmap of page sizes supported by the hardware
> + *
> + * Note: this is a temporary function, which will be removed once
> + * all IOMMU drivers are converted. The only reason it exists is to
> + * allow splitting the pgsizes changes to several patches in order to ease
> + * the review.
> + */
> +void register_iommu_pgsize(struct iommu_ops *ops, unsigned long pgsize_bitmap)
> +{
> +       if (iommu_ops || iommu_pgsize_bitmap || !pgsize_bitmap)
> +               BUG();
> +
> +       iommu_ops = ops;
> +       iommu_pgsize_bitmap = pgsize_bitmap;
> +
> +       /* find out the minimum page size only once */
> +       iommu_min_pagesz = 1 << __ffs(pgsize_bitmap);
> +}

Hmm, I thought a little bit about that and came to the conculusion it
might be best to just keep the page-sizes as a part of the iommu_ops
structure. So there is no need to extend the register_iommu interface.

Also, the bus_set_iommu interface is now in the -next branch. Would be
good if you rebase the patches to that interface.

You can find the current iommu tree with these changes at

	git://git.8bytes.org/scm/iommu.git

> @@ -115,26 +156,103 @@ int iommu_domain_has_cap(struct iommu_domain *domain,
>  EXPORT_SYMBOL_GPL(iommu_domain_has_cap);
> 
>  int iommu_map(struct iommu_domain *domain, unsigned long iova,
> -             phys_addr_t paddr, int gfp_order, int prot)
> +             phys_addr_t paddr, size_t size, int prot)
>  {
> -       size_t size;
> +       int ret = 0;
> +
> +       /*
> +        * both the virtual address and the physical one, as well as
> +        * the size of the mapping, must be aligned (at least) to the
> +        * size of the smallest page supported by the hardware
> +        */
> +       if (!IS_ALIGNED(iova | paddr | size, iommu_min_pagesz)) {
> +               pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%lx min_pagesz "
> +                       "0x%x\n", iova, (unsigned long)paddr,
> +                       (unsigned long)size, iommu_min_pagesz);
> +               return -EINVAL;
> +       }
> 
> -       size         = 0x1000UL << gfp_order;
> +       pr_debug("map: iova 0x%lx pa 0x%lx size 0x%lx\n", iova,
> +                               (unsigned long)paddr, (unsigned long)size);
> 
> -       BUG_ON(!IS_ALIGNED(iova | paddr, size));
> +       while (size) {
> +               unsigned long pgsize, addr_merge = iova | paddr;
> +               unsigned int pgsize_idx;
> 
> -       return iommu_ops->map(domain, iova, paddr, gfp_order, prot);
> +               /* Max page size that still fits into 'size' */
> +               pgsize_idx = __fls(size);
> +
> +               /* need to consider alignment requirements ? */
> +               if (likely(addr_merge)) {
> +                       /* Max page size allowed by both iova and paddr */
> +                       unsigned int align_pgsize_idx = __ffs(addr_merge);
> +
> +                       pgsize_idx = min(pgsize_idx, align_pgsize_idx);
> +               }
> +
> +               /* build a mask of acceptable page sizes */
> +               pgsize = (1UL << (pgsize_idx + 1)) - 1;
> +
> +               /* throw away page sizes not supported by the hardware */
> +               pgsize &= iommu_pgsize_bitmap;

I think we need some care here and check pgsize for 0. A BUG_ON should
do.

> +
> +               /* pick the biggest page */
> +               pgsize_idx = __fls(pgsize);
> +               pgsize = 1UL << pgsize_idx;
> +
> +               /* convert index to page order */
> +               pgsize_idx -= PAGE_SHIFT;
> +
> +               pr_debug("mapping: iova 0x%lx pa 0x%lx order %u\n", iova,
> +                                       (unsigned long)paddr, pgsize_idx);
> +
> +               ret = iommu_ops->map(domain, iova, paddr, pgsize_idx, prot);
> +               if (ret)
> +                       break;
> +
> +               iova += pgsize;
> +               paddr += pgsize;
> +               size -= pgsize;
> +       }
> +
> +       return ret;
>  }
>  EXPORT_SYMBOL_GPL(iommu_map);

Otherwise this looks much better, thanks.

> 
> -int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
> +int iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
>  {
> -       size_t size;
> +       int order, unmapped_size, unmapped_order, total_unmapped = 0;
> +
> +       /*
> +        * The virtual address, as well as the size of the mapping, must be
> +        * aligned (at least) to the size of the smallest page supported
> +        * by the hardware
> +        */
> +       if (!IS_ALIGNED(iova | size, iommu_min_pagesz)) {
> +               pr_err("unaligned: iova 0x%lx size 0x%lx min_pagesz 0x%x\n",
> +                               iova, (unsigned long)size, iommu_min_pagesz);
> +               return -EINVAL;
> +       }
> +
> +       pr_debug("unmap this: iova 0x%lx size 0x%lx\n", iova,
> +                                                       (unsigned long)size);
> +
> +       while (size > total_unmapped) {
> +               order = get_order(size - total_unmapped);
> +
> +               unmapped_order = iommu_ops->unmap(domain, iova, order);

I think we should make sure that we call iommu_ops->unmap with the same
parameters as iommu_ops->map. Otherwise we still need some page-size
complexity in the iommu-drivers.

> +               if (unmapped_order < 0)
> +                       return unmapped_order;
> +
> +               pr_debug("unmapped: iova 0x%lx order %d\n", iova,
> +                                                       unmapped_order);
> 
> -       size         = 0x1000UL << gfp_order;
> +               unmapped_size = 0x1000UL << unmapped_order;
> 
> -       BUG_ON(!IS_ALIGNED(iova, size));
> +               iova += unmapped_size;
> +               total_unmapped += unmapped_size;
> +       }
> 
> -       return iommu_ops->unmap(domain, iova, gfp_order);
> +       return get_order(total_unmapped);
>  }

When we pass the size now it makes sense to also return the
unmapped-size instead of the order.

Also the semantics of iommu_unmap need some more discussion I think. We
should require that iommu_map/iommu_unmap are called with the same
parameters for the same range. Otherwise we can give no guarantees what
a unmap-call will unmap.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-10  9:47                 ` Roedel, Joerg
  0 siblings, 0 replies; 106+ messages in thread
From: Roedel, Joerg @ 2011-10-10  9:47 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: iommu, linux-omap, Hiroshi DOYU, Laurent Pinchart,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Stepan Moskovchenko, kvm

Hi Ohad,

sorry, I was on vacation last week and had no time to look into this.

On Sun, Oct 02, 2011 at 11:58:12AM -0400, Ohad Ben-Cohen wrote:
>  drivers/iommu/iommu.c      |  138 ++++++++++++++++++++++++++++++++++++++++---
>  drivers/iommu/omap-iovmm.c |   12 +---
>  include/linux/iommu.h      |    6 +-
>  virt/kvm/iommu.c           |    4 +-
>  4 files changed, 137 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index a7b0862..f23563f 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -16,6 +16,8 @@
>   * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
>   */
> 
> +#define pr_fmt(fmt)    "%s: " fmt, __func__
> +
>  #include <linux/kernel.h>
>  #include <linux/bug.h>
>  #include <linux/types.h>
> @@ -23,15 +25,54 @@
>  #include <linux/slab.h>
>  #include <linux/errno.h>
>  #include <linux/iommu.h>
> +#include <linux/bitmap.h>

Is this still required?

> 
>  static struct iommu_ops *iommu_ops;
> 
> +/* bitmap of supported page sizes */
> +static unsigned long iommu_pgsize_bitmap;
> +
> +/* size of the smallest supported page (in bytes) */
> +static unsigned int iommu_min_pagesz;
> +
> +/**
> + * register_iommu() - register an IOMMU hardware
> + * @ops: iommu handlers
> + * @pgsize_bitmap: bitmap of page sizes supported by the hardware
> + *
> + * Note: this is a temporary function, which will be removed once
> + * all IOMMU drivers are converted. The only reason it exists is to
> + * allow splitting the pgsizes changes to several patches in order to ease
> + * the review.
> + */
> +void register_iommu_pgsize(struct iommu_ops *ops, unsigned long pgsize_bitmap)
> +{
> +       if (iommu_ops || iommu_pgsize_bitmap || !pgsize_bitmap)
> +               BUG();
> +
> +       iommu_ops = ops;
> +       iommu_pgsize_bitmap = pgsize_bitmap;
> +
> +       /* find out the minimum page size only once */
> +       iommu_min_pagesz = 1 << __ffs(pgsize_bitmap);
> +}

Hmm, I thought a little bit about that and came to the conculusion it
might be best to just keep the page-sizes as a part of the iommu_ops
structure. So there is no need to extend the register_iommu interface.

Also, the bus_set_iommu interface is now in the -next branch. Would be
good if you rebase the patches to that interface.

You can find the current iommu tree with these changes at

	git://git.8bytes.org/scm/iommu.git

> @@ -115,26 +156,103 @@ int iommu_domain_has_cap(struct iommu_domain *domain,
>  EXPORT_SYMBOL_GPL(iommu_domain_has_cap);
> 
>  int iommu_map(struct iommu_domain *domain, unsigned long iova,
> -             phys_addr_t paddr, int gfp_order, int prot)
> +             phys_addr_t paddr, size_t size, int prot)
>  {
> -       size_t size;
> +       int ret = 0;
> +
> +       /*
> +        * both the virtual address and the physical one, as well as
> +        * the size of the mapping, must be aligned (at least) to the
> +        * size of the smallest page supported by the hardware
> +        */
> +       if (!IS_ALIGNED(iova | paddr | size, iommu_min_pagesz)) {
> +               pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%lx min_pagesz "
> +                       "0x%x\n", iova, (unsigned long)paddr,
> +                       (unsigned long)size, iommu_min_pagesz);
> +               return -EINVAL;
> +       }
> 
> -       size         = 0x1000UL << gfp_order;
> +       pr_debug("map: iova 0x%lx pa 0x%lx size 0x%lx\n", iova,
> +                               (unsigned long)paddr, (unsigned long)size);
> 
> -       BUG_ON(!IS_ALIGNED(iova | paddr, size));
> +       while (size) {
> +               unsigned long pgsize, addr_merge = iova | paddr;
> +               unsigned int pgsize_idx;
> 
> -       return iommu_ops->map(domain, iova, paddr, gfp_order, prot);
> +               /* Max page size that still fits into 'size' */
> +               pgsize_idx = __fls(size);
> +
> +               /* need to consider alignment requirements ? */
> +               if (likely(addr_merge)) {
> +                       /* Max page size allowed by both iova and paddr */
> +                       unsigned int align_pgsize_idx = __ffs(addr_merge);
> +
> +                       pgsize_idx = min(pgsize_idx, align_pgsize_idx);
> +               }
> +
> +               /* build a mask of acceptable page sizes */
> +               pgsize = (1UL << (pgsize_idx + 1)) - 1;
> +
> +               /* throw away page sizes not supported by the hardware */
> +               pgsize &= iommu_pgsize_bitmap;

I think we need some care here and check pgsize for 0. A BUG_ON should
do.

> +
> +               /* pick the biggest page */
> +               pgsize_idx = __fls(pgsize);
> +               pgsize = 1UL << pgsize_idx;
> +
> +               /* convert index to page order */
> +               pgsize_idx -= PAGE_SHIFT;
> +
> +               pr_debug("mapping: iova 0x%lx pa 0x%lx order %u\n", iova,
> +                                       (unsigned long)paddr, pgsize_idx);
> +
> +               ret = iommu_ops->map(domain, iova, paddr, pgsize_idx, prot);
> +               if (ret)
> +                       break;
> +
> +               iova += pgsize;
> +               paddr += pgsize;
> +               size -= pgsize;
> +       }
> +
> +       return ret;
>  }
>  EXPORT_SYMBOL_GPL(iommu_map);

Otherwise this looks much better, thanks.

> 
> -int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
> +int iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
>  {
> -       size_t size;
> +       int order, unmapped_size, unmapped_order, total_unmapped = 0;
> +
> +       /*
> +        * The virtual address, as well as the size of the mapping, must be
> +        * aligned (at least) to the size of the smallest page supported
> +        * by the hardware
> +        */
> +       if (!IS_ALIGNED(iova | size, iommu_min_pagesz)) {
> +               pr_err("unaligned: iova 0x%lx size 0x%lx min_pagesz 0x%x\n",
> +                               iova, (unsigned long)size, iommu_min_pagesz);
> +               return -EINVAL;
> +       }
> +
> +       pr_debug("unmap this: iova 0x%lx size 0x%lx\n", iova,
> +                                                       (unsigned long)size);
> +
> +       while (size > total_unmapped) {
> +               order = get_order(size - total_unmapped);
> +
> +               unmapped_order = iommu_ops->unmap(domain, iova, order);

I think we should make sure that we call iommu_ops->unmap with the same
parameters as iommu_ops->map. Otherwise we still need some page-size
complexity in the iommu-drivers.

> +               if (unmapped_order < 0)
> +                       return unmapped_order;
> +
> +               pr_debug("unmapped: iova 0x%lx order %d\n", iova,
> +                                                       unmapped_order);
> 
> -       size         = 0x1000UL << gfp_order;
> +               unmapped_size = 0x1000UL << unmapped_order;
> 
> -       BUG_ON(!IS_ALIGNED(iova, size));
> +               iova += unmapped_size;
> +               total_unmapped += unmapped_size;
> +       }
> 
> -       return iommu_ops->unmap(domain, iova, gfp_order);
> +       return get_order(total_unmapped);
>  }

When we pass the size now it makes sense to also return the
unmapped-size instead of the order.

Also the semantics of iommu_unmap need some more discussion I think. We
should require that iommu_map/iommu_unmap are called with the same
parameters for the same range. Otherwise we can give no guarantees what
a unmap-call will unmap.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-10  9:47                 ` Roedel, Joerg
  0 siblings, 0 replies; 106+ messages in thread
From: Roedel, Joerg @ 2011-10-10  9:47 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ohad,

sorry, I was on vacation last week and had no time to look into this.

On Sun, Oct 02, 2011 at 11:58:12AM -0400, Ohad Ben-Cohen wrote:
>  drivers/iommu/iommu.c      |  138 ++++++++++++++++++++++++++++++++++++++++---
>  drivers/iommu/omap-iovmm.c |   12 +---
>  include/linux/iommu.h      |    6 +-
>  virt/kvm/iommu.c           |    4 +-
>  4 files changed, 137 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index a7b0862..f23563f 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -16,6 +16,8 @@
>   * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
>   */
> 
> +#define pr_fmt(fmt)    "%s: " fmt, __func__
> +
>  #include <linux/kernel.h>
>  #include <linux/bug.h>
>  #include <linux/types.h>
> @@ -23,15 +25,54 @@
>  #include <linux/slab.h>
>  #include <linux/errno.h>
>  #include <linux/iommu.h>
> +#include <linux/bitmap.h>

Is this still required?

> 
>  static struct iommu_ops *iommu_ops;
> 
> +/* bitmap of supported page sizes */
> +static unsigned long iommu_pgsize_bitmap;
> +
> +/* size of the smallest supported page (in bytes) */
> +static unsigned int iommu_min_pagesz;
> +
> +/**
> + * register_iommu() - register an IOMMU hardware
> + * @ops: iommu handlers
> + * @pgsize_bitmap: bitmap of page sizes supported by the hardware
> + *
> + * Note: this is a temporary function, which will be removed once
> + * all IOMMU drivers are converted. The only reason it exists is to
> + * allow splitting the pgsizes changes to several patches in order to ease
> + * the review.
> + */
> +void register_iommu_pgsize(struct iommu_ops *ops, unsigned long pgsize_bitmap)
> +{
> +       if (iommu_ops || iommu_pgsize_bitmap || !pgsize_bitmap)
> +               BUG();
> +
> +       iommu_ops = ops;
> +       iommu_pgsize_bitmap = pgsize_bitmap;
> +
> +       /* find out the minimum page size only once */
> +       iommu_min_pagesz = 1 << __ffs(pgsize_bitmap);
> +}

Hmm, I thought a little bit about that and came to the conculusion it
might be best to just keep the page-sizes as a part of the iommu_ops
structure. So there is no need to extend the register_iommu interface.

Also, the bus_set_iommu interface is now in the -next branch. Would be
good if you rebase the patches to that interface.

You can find the current iommu tree with these changes at

	git://git.8bytes.org/scm/iommu.git

> @@ -115,26 +156,103 @@ int iommu_domain_has_cap(struct iommu_domain *domain,
>  EXPORT_SYMBOL_GPL(iommu_domain_has_cap);
> 
>  int iommu_map(struct iommu_domain *domain, unsigned long iova,
> -             phys_addr_t paddr, int gfp_order, int prot)
> +             phys_addr_t paddr, size_t size, int prot)
>  {
> -       size_t size;
> +       int ret = 0;
> +
> +       /*
> +        * both the virtual address and the physical one, as well as
> +        * the size of the mapping, must be aligned (at least) to the
> +        * size of the smallest page supported by the hardware
> +        */
> +       if (!IS_ALIGNED(iova | paddr | size, iommu_min_pagesz)) {
> +               pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%lx min_pagesz "
> +                       "0x%x\n", iova, (unsigned long)paddr,
> +                       (unsigned long)size, iommu_min_pagesz);
> +               return -EINVAL;
> +       }
> 
> -       size         = 0x1000UL << gfp_order;
> +       pr_debug("map: iova 0x%lx pa 0x%lx size 0x%lx\n", iova,
> +                               (unsigned long)paddr, (unsigned long)size);
> 
> -       BUG_ON(!IS_ALIGNED(iova | paddr, size));
> +       while (size) {
> +               unsigned long pgsize, addr_merge = iova | paddr;
> +               unsigned int pgsize_idx;
> 
> -       return iommu_ops->map(domain, iova, paddr, gfp_order, prot);
> +               /* Max page size that still fits into 'size' */
> +               pgsize_idx = __fls(size);
> +
> +               /* need to consider alignment requirements ? */
> +               if (likely(addr_merge)) {
> +                       /* Max page size allowed by both iova and paddr */
> +                       unsigned int align_pgsize_idx = __ffs(addr_merge);
> +
> +                       pgsize_idx = min(pgsize_idx, align_pgsize_idx);
> +               }
> +
> +               /* build a mask of acceptable page sizes */
> +               pgsize = (1UL << (pgsize_idx + 1)) - 1;
> +
> +               /* throw away page sizes not supported by the hardware */
> +               pgsize &= iommu_pgsize_bitmap;

I think we need some care here and check pgsize for 0. A BUG_ON should
do.

> +
> +               /* pick the biggest page */
> +               pgsize_idx = __fls(pgsize);
> +               pgsize = 1UL << pgsize_idx;
> +
> +               /* convert index to page order */
> +               pgsize_idx -= PAGE_SHIFT;
> +
> +               pr_debug("mapping: iova 0x%lx pa 0x%lx order %u\n", iova,
> +                                       (unsigned long)paddr, pgsize_idx);
> +
> +               ret = iommu_ops->map(domain, iova, paddr, pgsize_idx, prot);
> +               if (ret)
> +                       break;
> +
> +               iova += pgsize;
> +               paddr += pgsize;
> +               size -= pgsize;
> +       }
> +
> +       return ret;
>  }
>  EXPORT_SYMBOL_GPL(iommu_map);

Otherwise this looks much better, thanks.

> 
> -int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
> +int iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
>  {
> -       size_t size;
> +       int order, unmapped_size, unmapped_order, total_unmapped = 0;
> +
> +       /*
> +        * The virtual address, as well as the size of the mapping, must be
> +        * aligned (at least) to the size of the smallest page supported
> +        * by the hardware
> +        */
> +       if (!IS_ALIGNED(iova | size, iommu_min_pagesz)) {
> +               pr_err("unaligned: iova 0x%lx size 0x%lx min_pagesz 0x%x\n",
> +                               iova, (unsigned long)size, iommu_min_pagesz);
> +               return -EINVAL;
> +       }
> +
> +       pr_debug("unmap this: iova 0x%lx size 0x%lx\n", iova,
> +                                                       (unsigned long)size);
> +
> +       while (size > total_unmapped) {
> +               order = get_order(size - total_unmapped);
> +
> +               unmapped_order = iommu_ops->unmap(domain, iova, order);

I think we should make sure that we call iommu_ops->unmap with the same
parameters as iommu_ops->map. Otherwise we still need some page-size
complexity in the iommu-drivers.

> +               if (unmapped_order < 0)
> +                       return unmapped_order;
> +
> +               pr_debug("unmapped: iova 0x%lx order %d\n", iova,
> +                                                       unmapped_order);
> 
> -       size         = 0x1000UL << gfp_order;
> +               unmapped_size = 0x1000UL << unmapped_order;
> 
> -       BUG_ON(!IS_ALIGNED(iova, size));
> +               iova += unmapped_size;
> +               total_unmapped += unmapped_size;
> +       }
> 
> -       return iommu_ops->unmap(domain, iova, gfp_order);
> +       return get_order(total_unmapped);
>  }

When we pass the size now it makes sense to also return the
unmapped-size instead of the order.

Also the semantics of iommu_unmap need some more discussion I think. We
should require that iommu_map/iommu_unmap are called with the same
parameters for the same range. Otherwise we can give no guarantees what
a unmap-call will unmap.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
  2011-10-02 15:58               ` Ohad Ben-Cohen
  (?)
@ 2011-10-10 12:52                 ` KyongHo Cho
  -1 siblings, 0 replies; 106+ messages in thread
From: KyongHo Cho @ 2011-10-10 12:52 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: Roedel, Joerg, iommu, linux-omap, Hiroshi DOYU, Laurent Pinchart,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Stepan Moskovchenko, kvm

On Mon, Oct 3, 2011 at 12:58 AM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
>  int iommu_map(struct iommu_domain *domain, unsigned long iova,
> -             phys_addr_t paddr, int gfp_order, int prot)
> +             phys_addr_t paddr, size_t size, int prot)
>  {
> -       size_t size;
> +       int ret = 0;
> +
> +       /*
> +        * both the virtual address and the physical one, as well as
> +        * the size of the mapping, must be aligned (at least) to the
> +        * size of the smallest page supported by the hardware
> +        */
> +       if (!IS_ALIGNED(iova | paddr | size, iommu_min_pagesz)) {
> +               pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%lx min_pagesz "
> +                       "0x%x\n", iova, (unsigned long)paddr,
> +                       (unsigned long)size, iommu_min_pagesz);
> +               return -EINVAL;
> +       }
>
> -       size         = 0x1000UL << gfp_order;
> +       pr_debug("map: iova 0x%lx pa 0x%lx size 0x%lx\n", iova,
> +                               (unsigned long)paddr, (unsigned long)size);
>
> -       BUG_ON(!IS_ALIGNED(iova | paddr, size));
> +       while (size) {
> +               unsigned long pgsize, addr_merge = iova | paddr;
> +               unsigned int pgsize_idx;
>
> -       return iommu_ops->map(domain, iova, paddr, gfp_order, prot);
> +               /* Max page size that still fits into 'size' */
> +               pgsize_idx = __fls(size);
> +
> +               /* need to consider alignment requirements ? */
> +               if (likely(addr_merge)) {
> +                       /* Max page size allowed by both iova and paddr */
> +                       unsigned int align_pgsize_idx = __ffs(addr_merge);
> +
> +                       pgsize_idx = min(pgsize_idx, align_pgsize_idx);
> +               }
> +
> +               /* build a mask of acceptable page sizes */
> +               pgsize = (1UL << (pgsize_idx + 1)) - 1;
> +
> +               /* throw away page sizes not supported by the hardware */
> +               pgsize &= iommu_pgsize_bitmap;
> +
> +               /* pick the biggest page */
> +               pgsize_idx = __fls(pgsize);
> +               pgsize = 1UL << pgsize_idx;
> +
> +               /* convert index to page order */
> +               pgsize_idx -= PAGE_SHIFT;
> +
> +               pr_debug("mapping: iova 0x%lx pa 0x%lx order %u\n", iova,
> +                                       (unsigned long)paddr, pgsize_idx);
> +
> +               ret = iommu_ops->map(domain, iova, paddr, pgsize_idx, prot);
> +               if (ret)
> +                       break;
> +
> +               iova += pgsize;
> +               paddr += pgsize;
> +               size -= pgsize;
> +       }
> +
> +       return ret;
>  }
>  EXPORT_SYMBOL_GPL(iommu_map);

Do not we need to unmap all intermediate mappings if iommu_map() is failed?

I think iommu_map() must map the entire given area if it successes.
Otherwise, it should map nothing.

I think it can be simply done with calling iommu_unmap()
with Joerg's previous suggestion about iommu_unmap().

Regards,
KyongHo.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-10 12:52                 ` KyongHo Cho
  0 siblings, 0 replies; 106+ messages in thread
From: KyongHo Cho @ 2011-10-10 12:52 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: Roedel, Joerg, iommu, linux-omap, Hiroshi DOYU, Laurent Pinchart,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Stepan Moskovchenko, kvm

On Mon, Oct 3, 2011 at 12:58 AM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
>  int iommu_map(struct iommu_domain *domain, unsigned long iova,
> -             phys_addr_t paddr, int gfp_order, int prot)
> +             phys_addr_t paddr, size_t size, int prot)
>  {
> -       size_t size;
> +       int ret = 0;
> +
> +       /*
> +        * both the virtual address and the physical one, as well as
> +        * the size of the mapping, must be aligned (at least) to the
> +        * size of the smallest page supported by the hardware
> +        */
> +       if (!IS_ALIGNED(iova | paddr | size, iommu_min_pagesz)) {
> +               pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%lx min_pagesz "
> +                       "0x%x\n", iova, (unsigned long)paddr,
> +                       (unsigned long)size, iommu_min_pagesz);
> +               return -EINVAL;
> +       }
>
> -       size         = 0x1000UL << gfp_order;
> +       pr_debug("map: iova 0x%lx pa 0x%lx size 0x%lx\n", iova,
> +                               (unsigned long)paddr, (unsigned long)size);
>
> -       BUG_ON(!IS_ALIGNED(iova | paddr, size));
> +       while (size) {
> +               unsigned long pgsize, addr_merge = iova | paddr;
> +               unsigned int pgsize_idx;
>
> -       return iommu_ops->map(domain, iova, paddr, gfp_order, prot);
> +               /* Max page size that still fits into 'size' */
> +               pgsize_idx = __fls(size);
> +
> +               /* need to consider alignment requirements ? */
> +               if (likely(addr_merge)) {
> +                       /* Max page size allowed by both iova and paddr */
> +                       unsigned int align_pgsize_idx = __ffs(addr_merge);
> +
> +                       pgsize_idx = min(pgsize_idx, align_pgsize_idx);
> +               }
> +
> +               /* build a mask of acceptable page sizes */
> +               pgsize = (1UL << (pgsize_idx + 1)) - 1;
> +
> +               /* throw away page sizes not supported by the hardware */
> +               pgsize &= iommu_pgsize_bitmap;
> +
> +               /* pick the biggest page */
> +               pgsize_idx = __fls(pgsize);
> +               pgsize = 1UL << pgsize_idx;
> +
> +               /* convert index to page order */
> +               pgsize_idx -= PAGE_SHIFT;
> +
> +               pr_debug("mapping: iova 0x%lx pa 0x%lx order %u\n", iova,
> +                                       (unsigned long)paddr, pgsize_idx);
> +
> +               ret = iommu_ops->map(domain, iova, paddr, pgsize_idx, prot);
> +               if (ret)
> +                       break;
> +
> +               iova += pgsize;
> +               paddr += pgsize;
> +               size -= pgsize;
> +       }
> +
> +       return ret;
>  }
>  EXPORT_SYMBOL_GPL(iommu_map);

Do not we need to unmap all intermediate mappings if iommu_map() is failed?

I think iommu_map() must map the entire given area if it successes.
Otherwise, it should map nothing.

I think it can be simply done with calling iommu_unmap()
with Joerg's previous suggestion about iommu_unmap().

Regards,
KyongHo.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-10 12:52                 ` KyongHo Cho
  0 siblings, 0 replies; 106+ messages in thread
From: KyongHo Cho @ 2011-10-10 12:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 3, 2011 at 12:58 AM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> ?int iommu_map(struct iommu_domain *domain, unsigned long iova,
> - ? ? ? ? ? ? phys_addr_t paddr, int gfp_order, int prot)
> + ? ? ? ? ? ? phys_addr_t paddr, size_t size, int prot)
> ?{
> - ? ? ? size_t size;
> + ? ? ? int ret = 0;
> +
> + ? ? ? /*
> + ? ? ? ?* both the virtual address and the physical one, as well as
> + ? ? ? ?* the size of the mapping, must be aligned (at least) to the
> + ? ? ? ?* size of the smallest page supported by the hardware
> + ? ? ? ?*/
> + ? ? ? if (!IS_ALIGNED(iova | paddr | size, iommu_min_pagesz)) {
> + ? ? ? ? ? ? ? pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%lx min_pagesz "
> + ? ? ? ? ? ? ? ? ? ? ? "0x%x\n", iova, (unsigned long)paddr,
> + ? ? ? ? ? ? ? ? ? ? ? (unsigned long)size, iommu_min_pagesz);
> + ? ? ? ? ? ? ? return -EINVAL;
> + ? ? ? }
>
> - ? ? ? size ? ? ? ? = 0x1000UL << gfp_order;
> + ? ? ? pr_debug("map: iova 0x%lx pa 0x%lx size 0x%lx\n", iova,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (unsigned long)paddr, (unsigned long)size);
>
> - ? ? ? BUG_ON(!IS_ALIGNED(iova | paddr, size));
> + ? ? ? while (size) {
> + ? ? ? ? ? ? ? unsigned long pgsize, addr_merge = iova | paddr;
> + ? ? ? ? ? ? ? unsigned int pgsize_idx;
>
> - ? ? ? return iommu_ops->map(domain, iova, paddr, gfp_order, prot);
> + ? ? ? ? ? ? ? /* Max page size that still fits into 'size' */
> + ? ? ? ? ? ? ? pgsize_idx = __fls(size);
> +
> + ? ? ? ? ? ? ? /* need to consider alignment requirements ? */
> + ? ? ? ? ? ? ? if (likely(addr_merge)) {
> + ? ? ? ? ? ? ? ? ? ? ? /* Max page size allowed by both iova and paddr */
> + ? ? ? ? ? ? ? ? ? ? ? unsigned int align_pgsize_idx = __ffs(addr_merge);
> +
> + ? ? ? ? ? ? ? ? ? ? ? pgsize_idx = min(pgsize_idx, align_pgsize_idx);
> + ? ? ? ? ? ? ? }
> +
> + ? ? ? ? ? ? ? /* build a mask of acceptable page sizes */
> + ? ? ? ? ? ? ? pgsize = (1UL << (pgsize_idx + 1)) - 1;
> +
> + ? ? ? ? ? ? ? /* throw away page sizes not supported by the hardware */
> + ? ? ? ? ? ? ? pgsize &= iommu_pgsize_bitmap;
> +
> + ? ? ? ? ? ? ? /* pick the biggest page */
> + ? ? ? ? ? ? ? pgsize_idx = __fls(pgsize);
> + ? ? ? ? ? ? ? pgsize = 1UL << pgsize_idx;
> +
> + ? ? ? ? ? ? ? /* convert index to page order */
> + ? ? ? ? ? ? ? pgsize_idx -= PAGE_SHIFT;
> +
> + ? ? ? ? ? ? ? pr_debug("mapping: iova 0x%lx pa 0x%lx order %u\n", iova,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (unsigned long)paddr, pgsize_idx);
> +
> + ? ? ? ? ? ? ? ret = iommu_ops->map(domain, iova, paddr, pgsize_idx, prot);
> + ? ? ? ? ? ? ? if (ret)
> + ? ? ? ? ? ? ? ? ? ? ? break;
> +
> + ? ? ? ? ? ? ? iova += pgsize;
> + ? ? ? ? ? ? ? paddr += pgsize;
> + ? ? ? ? ? ? ? size -= pgsize;
> + ? ? ? }
> +
> + ? ? ? return ret;
> ?}
> ?EXPORT_SYMBOL_GPL(iommu_map);

Do not we need to unmap all intermediate mappings if iommu_map() is failed?

I think iommu_map() must map the entire given area if it successes.
Otherwise, it should map nothing.

I think it can be simply done with calling iommu_unmap()
with Joerg's previous suggestion about iommu_unmap().

Regards,
KyongHo.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
  2011-10-10  9:47                 ` Roedel, Joerg
  (?)
@ 2011-10-10 13:59                   ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-10 13:59 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: iommu, linux-omap, Laurent Pinchart, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Stepan Moskovchenko, kvm, Hiroshi Doyu

[ -bouncing Hiroshi.DOYU@nokia.com, +not-bouncing hdoyu@nvidia.com :
hi Hiroshi :) ]

Hi Joerg,

On Mon, Oct 10, 2011 at 11:47 AM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> sorry, I was on vacation last week and had no time to look into this.

Sure thing, thanks for replying!

>> +#include <linux/bitmap.h>
>
> Is this still required?

Nope, removed, thanks.

> Hmm, I thought a little bit about that and came to the conculusion it
> might be best to just keep the page-sizes as a part of the iommu_ops
> structure. So there is no need to extend the register_iommu interface.

Sure. That was one of my initial alternatives, but I decided against
it at that time. I'll bring it back - it will help with the
bus_set_iommu rebasing.

> Also, the bus_set_iommu interface is now in the -next branch. Would be
> good if you rebase the patches to that interface.

Sure. It's a little tricky though: which branch do I base this on ?
Are you ok with me basing this on your 'next' branch ? My current
stack depends at least on three branches of yours, so that would be
helpful for me (and less merging conflicts for you I guess :).

> I think we need some care here and check pgsize for 0. A BUG_ON should
> do.

I can add it if you prefer, but I don't think it can really happen:
basically, it means that we chose a too small and unsupported page
bit, which can't happen as long as we check for IS_ALIGNED(iova |
paddr | size, iommu_min_pagesz) in the beginning of iommu_map.

>> +               unmapped_order = iommu_ops->unmap(domain, iova, order);
>
> I think we should make sure that we call iommu_ops->unmap with the same
> parameters as iommu_ops->map. Otherwise we still need some page-size
> complexity in the iommu-drivers.

Ok, let's discuss the semantics of ->unmap().

There isn't a clear documentation of that API (we should probably add
some kernel docs after we nail it down now), but judging from the
existing users (mainly kvm) and drivers, it seems that iommu_map() and
iommu_unmap() aren't symmetric: users rely on unmap() to return the
actual size that was unmapped. IOMMU drivers, in turn, should check
which page is mapped on 'iova', unmap it, and return its size.

This way iommu_unmap() becomes very simple: it just iterates through
the region, relying on iommu_ops->unmap() to return the sizes that
were actually unmapped (very similar to how amd's iommu_unmap_page
works today). This also means that iommu_ops->unmap() doesn't really
need a size/order argument and we can remove it (after all drivers
fully migrate..).

The other approach which you suggest means symmetric iommu_map() and
iommu_unmap(). It means adding a 'paddr' parameter to iommu_unmap(),
which is easy, but maybe more concerning is the limitation that it
incurs: users will now have to call iommu_unmap() exactly as they
called iommu_map() beforehand. Note sure how well this will fly with
the existing users (kvm ?) and whether we really want to enforce this
(it doesn't mean drivers need to deal with page-size complexity. they
are required to unmap a single page at a time, and iommu_unmap() will
do the work for them).

Another discussion:

I think we better change iommu_ops->map() to directly take a 'size'
(in bytes) instead of an 'order' (of pages). Most (all?) drivers just
immediately do 'size = 0x1000UL << gfp_order', so this whole size ->
order -> size back and forth seems redundant.

> When we pass the size now it makes sense to also return the
> unmapped-size instead of the order.

Sure.

Thanks for your review,
Ohad.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-10 13:59                   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-10 13:59 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: iommu, linux-omap, Laurent Pinchart, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Stepan Moskovchenko, kvm, Hiroshi Doyu

[ -bouncing Hiroshi.DOYU@nokia.com, +not-bouncing hdoyu@nvidia.com :
hi Hiroshi :) ]

Hi Joerg,

On Mon, Oct 10, 2011 at 11:47 AM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> sorry, I was on vacation last week and had no time to look into this.

Sure thing, thanks for replying!

>> +#include <linux/bitmap.h>
>
> Is this still required?

Nope, removed, thanks.

> Hmm, I thought a little bit about that and came to the conculusion it
> might be best to just keep the page-sizes as a part of the iommu_ops
> structure. So there is no need to extend the register_iommu interface.

Sure. That was one of my initial alternatives, but I decided against
it at that time. I'll bring it back - it will help with the
bus_set_iommu rebasing.

> Also, the bus_set_iommu interface is now in the -next branch. Would be
> good if you rebase the patches to that interface.

Sure. It's a little tricky though: which branch do I base this on ?
Are you ok with me basing this on your 'next' branch ? My current
stack depends at least on three branches of yours, so that would be
helpful for me (and less merging conflicts for you I guess :).

> I think we need some care here and check pgsize for 0. A BUG_ON should
> do.

I can add it if you prefer, but I don't think it can really happen:
basically, it means that we chose a too small and unsupported page
bit, which can't happen as long as we check for IS_ALIGNED(iova |
paddr | size, iommu_min_pagesz) in the beginning of iommu_map.

>> +               unmapped_order = iommu_ops->unmap(domain, iova, order);
>
> I think we should make sure that we call iommu_ops->unmap with the same
> parameters as iommu_ops->map. Otherwise we still need some page-size
> complexity in the iommu-drivers.

Ok, let's discuss the semantics of ->unmap().

There isn't a clear documentation of that API (we should probably add
some kernel docs after we nail it down now), but judging from the
existing users (mainly kvm) and drivers, it seems that iommu_map() and
iommu_unmap() aren't symmetric: users rely on unmap() to return the
actual size that was unmapped. IOMMU drivers, in turn, should check
which page is mapped on 'iova', unmap it, and return its size.

This way iommu_unmap() becomes very simple: it just iterates through
the region, relying on iommu_ops->unmap() to return the sizes that
were actually unmapped (very similar to how amd's iommu_unmap_page
works today). This also means that iommu_ops->unmap() doesn't really
need a size/order argument and we can remove it (after all drivers
fully migrate..).

The other approach which you suggest means symmetric iommu_map() and
iommu_unmap(). It means adding a 'paddr' parameter to iommu_unmap(),
which is easy, but maybe more concerning is the limitation that it
incurs: users will now have to call iommu_unmap() exactly as they
called iommu_map() beforehand. Note sure how well this will fly with
the existing users (kvm ?) and whether we really want to enforce this
(it doesn't mean drivers need to deal with page-size complexity. they
are required to unmap a single page at a time, and iommu_unmap() will
do the work for them).

Another discussion:

I think we better change iommu_ops->map() to directly take a 'size'
(in bytes) instead of an 'order' (of pages). Most (all?) drivers just
immediately do 'size = 0x1000UL << gfp_order', so this whole size ->
order -> size back and forth seems redundant.

> When we pass the size now it makes sense to also return the
> unmapped-size instead of the order.

Sure.

Thanks for your review,
Ohad.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-10 13:59                   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-10 13:59 UTC (permalink / raw)
  To: linux-arm-kernel

[ -bouncing Hiroshi.DOYU at nokia.com, +not-bouncing hdoyu at nvidia.com :
hi Hiroshi :) ]

Hi Joerg,

On Mon, Oct 10, 2011 at 11:47 AM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> sorry, I was on vacation last week and had no time to look into this.

Sure thing, thanks for replying!

>> +#include <linux/bitmap.h>
>
> Is this still required?

Nope, removed, thanks.

> Hmm, I thought a little bit about that and came to the conculusion it
> might be best to just keep the page-sizes as a part of the iommu_ops
> structure. So there is no need to extend the register_iommu interface.

Sure. That was one of my initial alternatives, but I decided against
it at that time. I'll bring it back - it will help with the
bus_set_iommu rebasing.

> Also, the bus_set_iommu interface is now in the -next branch. Would be
> good if you rebase the patches to that interface.

Sure. It's a little tricky though: which branch do I base this on ?
Are you ok with me basing this on your 'next' branch ? My current
stack depends at least on three branches of yours, so that would be
helpful for me (and less merging conflicts for you I guess :).

> I think we need some care here and check pgsize for 0. A BUG_ON should
> do.

I can add it if you prefer, but I don't think it can really happen:
basically, it means that we chose a too small and unsupported page
bit, which can't happen as long as we check for IS_ALIGNED(iova |
paddr | size, iommu_min_pagesz) in the beginning of iommu_map.

>> + ? ? ? ? ? ? ? unmapped_order = iommu_ops->unmap(domain, iova, order);
>
> I think we should make sure that we call iommu_ops->unmap with the same
> parameters as iommu_ops->map. Otherwise we still need some page-size
> complexity in the iommu-drivers.

Ok, let's discuss the semantics of ->unmap().

There isn't a clear documentation of that API (we should probably add
some kernel docs after we nail it down now), but judging from the
existing users (mainly kvm) and drivers, it seems that iommu_map() and
iommu_unmap() aren't symmetric: users rely on unmap() to return the
actual size that was unmapped. IOMMU drivers, in turn, should check
which page is mapped on 'iova', unmap it, and return its size.

This way iommu_unmap() becomes very simple: it just iterates through
the region, relying on iommu_ops->unmap() to return the sizes that
were actually unmapped (very similar to how amd's iommu_unmap_page
works today). This also means that iommu_ops->unmap() doesn't really
need a size/order argument and we can remove it (after all drivers
fully migrate..).

The other approach which you suggest means symmetric iommu_map() and
iommu_unmap(). It means adding a 'paddr' parameter to iommu_unmap(),
which is easy, but maybe more concerning is the limitation that it
incurs: users will now have to call iommu_unmap() exactly as they
called iommu_map() beforehand. Note sure how well this will fly with
the existing users (kvm ?) and whether we really want to enforce this
(it doesn't mean drivers need to deal with page-size complexity. they
are required to unmap a single page at a time, and iommu_unmap() will
do the work for them).

Another discussion:

I think we better change iommu_ops->map() to directly take a 'size'
(in bytes) instead of an 'order' (of pages). Most (all?) drivers just
immediately do 'size = 0x1000UL << gfp_order', so this whole size ->
order -> size back and forth seems redundant.

> When we pass the size now it makes sense to also return the
> unmapped-size instead of the order.

Sure.

Thanks for your review,
Ohad.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
  2011-10-10 12:52                 ` KyongHo Cho
  (?)
@ 2011-10-10 14:00                   ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-10 14:00 UTC (permalink / raw)
  To: KyongHo Cho
  Cc: Roedel, Joerg, iommu, linux-omap, Hiroshi DOYU, Laurent Pinchart,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Stepan Moskovchenko, kvm

On Mon, Oct 10, 2011 at 2:52 PM, KyongHo Cho <pullip.cho@samsung.com> wrote:
> Do not we need to unmap all intermediate mappings if iommu_map() is failed?

Good idea, I'll add it.

Thanks!
Ohad.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-10 14:00                   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-10 14:00 UTC (permalink / raw)
  To: KyongHo Cho
  Cc: Roedel, Joerg, iommu, linux-omap, Hiroshi DOYU, Laurent Pinchart,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Stepan Moskovchenko, kvm

On Mon, Oct 10, 2011 at 2:52 PM, KyongHo Cho <pullip.cho@samsung.com> wrote:
> Do not we need to unmap all intermediate mappings if iommu_map() is failed?

Good idea, I'll add it.

Thanks!
Ohad.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-10 14:00                   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-10 14:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 10, 2011 at 2:52 PM, KyongHo Cho <pullip.cho@samsung.com> wrote:
> Do not we need to unmap all intermediate mappings if iommu_map() is failed?

Good idea, I'll add it.

Thanks!
Ohad.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
  2011-10-10 13:59                   ` Ohad Ben-Cohen
  (?)
@ 2011-10-10 15:36                     ` Roedel, Joerg
  -1 siblings, 0 replies; 106+ messages in thread
From: Roedel, Joerg @ 2011-10-10 15:36 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: iommu, linux-omap, Laurent Pinchart, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Stepan Moskovchenko, kvm, Hiroshi Doyu

Hi Ohad,

On Mon, Oct 10, 2011 at 09:59:22AM -0400, Ohad Ben-Cohen wrote:
> > Also, the bus_set_iommu interface is now in the -next branch. Would be
> > good if you rebase the patches to that interface.
> 
> Sure. It's a little tricky though: which branch do I base this on ?
> Are you ok with me basing this on your 'next' branch ? My current
> stack depends at least on three branches of yours, so that would be
> helpful for me (and less merging conflicts for you I guess :).

The master branch is best to base your patches on for generic work. For
more specific things like omap-only changes you can use the topic
branches.

> > I think we need some care here and check pgsize for 0. A BUG_ON should
> > do.
> 
> I can add it if you prefer, but I don't think it can really happen:
> basically, it means that we chose a too small and unsupported page
> bit, which can't happen as long as we check for IS_ALIGNED(iova |
> paddr | size, iommu_min_pagesz) in the beginning of iommu_map.

It can happen when there is a bug somewhere :) So a BUG_ON will yell
then and makes debugging easier. An alternative is to use a WARN_ON and
let the map-call fail in this case.

> Ok, let's discuss the semantics of ->unmap().
> 
> There isn't a clear documentation of that API (we should probably add
> some kernel docs after we nail it down now), but judging from the
> existing users (mainly kvm) and drivers, it seems that iommu_map() and
> iommu_unmap() aren't symmetric: users rely on unmap() to return the
> actual size that was unmapped. IOMMU drivers, in turn, should check
> which page is mapped on 'iova', unmap it, and return its size.

Right, currently the map/unmap calls are not symetric. But I think they
should be to get a clean semantic. Without this requirement and multiple
page-sizes in use the iommu-code may has to unmap more address space then
requested. The user doesn't know what will be unmapped so it has to make
sure that no DMA is happening while unmap runs.

When we require the calls to be symetric we can give a guarantee that
only the requested region is unmapped and allow DMA to the untouched
part of the address-space while unmap() is running.

So when the call-places to not follow this restriction we should convert
them mid-term.

> This way iommu_unmap() becomes very simple: it just iterates through
> the region, relying on iommu_ops->unmap() to return the sizes that
> were actually unmapped (very similar to how amd's iommu_unmap_page
> works today). This also means that iommu_ops->unmap() doesn't really
> need a size/order argument and we can remove it (after all drivers
> fully migrate..).

Yes, somthing like that. Probably the iommu_ops->unmap function should
be turned into a unmap_page function call which only takes an iova and
no size parameter. The iommu-driver unmaps the page pointing to that
iova and returns the size of the page unmapped. This still allows the
simple implementation for the unmap-call.

This change is no requirement for this patch-set, but if we agree on it
this patch-set should keep that direction in mind.

> The other approach which you suggest means symmetric iommu_map() and
> iommu_unmap(). It means adding a 'paddr' parameter to iommu_unmap(),
> which is easy, but maybe more concerning is the limitation that it
> incurs: users will now have to call iommu_unmap() exactly as they
> called iommu_map() beforehand. Note sure how well this will fly with
> the existing users (kvm ?) and whether we really want to enforce this
> (it doesn't mean drivers need to deal with page-size complexity. they
> are required to unmap a single page at a time, and iommu_unmap() will
> do the work for them).

It will work with KVM, that is no problem. We don't need to really
enforce the calls to be symetric. But we can define that we only give
the guarantee about what will be unmapped when the calls are symetric.
For example:

	iommu_map(  0, 0x100000);
	iommu_unmap(0, 0x100000); /* Guarantee that it will only unmap
				     the range 0-0x100000 */

whereas:

	iommu_map(  0, 0x100000);
	iommu_unmap(0,   0x1000); /* Guarantees that 0-0x1000 is
				     unmapped, but other undefined parts
				     of the address space may be
				     unmapped too, up to the whole
				     address space */

The alternative is that we implement page-splitting in the iommu_unmap
function. But that introduces complexity I am not sure we really need.
KVM for example just unmaps the whole address-space on destruction. For
the generic dma_ops this is also not required because the dma_map*
functions already have the requirement to be symetric.

> 
> Another discussion:
> 
> I think we better change iommu_ops->map() to directly take a 'size'
> (in bytes) instead of an 'order' (of pages). Most (all?) drivers just
> immediately do 'size = 0x1000UL << gfp_order', so this whole size ->
> order -> size back and forth seems redundant.

Yes, this get_order thing should be changes to size long-term.

Regards,

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-10 15:36                     ` Roedel, Joerg
  0 siblings, 0 replies; 106+ messages in thread
From: Roedel, Joerg @ 2011-10-10 15:36 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: kvm, Arnd Bergmann, linux-kernel, iommu, Hiroshi Doyu,
	Laurent Pinchart, David Brown, linux-omap, David Woodhouse,
	linux-arm-kernel, Stepan Moskovchenko

Hi Ohad,

On Mon, Oct 10, 2011 at 09:59:22AM -0400, Ohad Ben-Cohen wrote:
> > Also, the bus_set_iommu interface is now in the -next branch. Would be
> > good if you rebase the patches to that interface.
> 
> Sure. It's a little tricky though: which branch do I base this on ?
> Are you ok with me basing this on your 'next' branch ? My current
> stack depends at least on three branches of yours, so that would be
> helpful for me (and less merging conflicts for you I guess :).

The master branch is best to base your patches on for generic work. For
more specific things like omap-only changes you can use the topic
branches.

> > I think we need some care here and check pgsize for 0. A BUG_ON should
> > do.
> 
> I can add it if you prefer, but I don't think it can really happen:
> basically, it means that we chose a too small and unsupported page
> bit, which can't happen as long as we check for IS_ALIGNED(iova |
> paddr | size, iommu_min_pagesz) in the beginning of iommu_map.

It can happen when there is a bug somewhere :) So a BUG_ON will yell
then and makes debugging easier. An alternative is to use a WARN_ON and
let the map-call fail in this case.

> Ok, let's discuss the semantics of ->unmap().
> 
> There isn't a clear documentation of that API (we should probably add
> some kernel docs after we nail it down now), but judging from the
> existing users (mainly kvm) and drivers, it seems that iommu_map() and
> iommu_unmap() aren't symmetric: users rely on unmap() to return the
> actual size that was unmapped. IOMMU drivers, in turn, should check
> which page is mapped on 'iova', unmap it, and return its size.

Right, currently the map/unmap calls are not symetric. But I think they
should be to get a clean semantic. Without this requirement and multiple
page-sizes in use the iommu-code may has to unmap more address space then
requested. The user doesn't know what will be unmapped so it has to make
sure that no DMA is happening while unmap runs.

When we require the calls to be symetric we can give a guarantee that
only the requested region is unmapped and allow DMA to the untouched
part of the address-space while unmap() is running.

So when the call-places to not follow this restriction we should convert
them mid-term.

> This way iommu_unmap() becomes very simple: it just iterates through
> the region, relying on iommu_ops->unmap() to return the sizes that
> were actually unmapped (very similar to how amd's iommu_unmap_page
> works today). This also means that iommu_ops->unmap() doesn't really
> need a size/order argument and we can remove it (after all drivers
> fully migrate..).

Yes, somthing like that. Probably the iommu_ops->unmap function should
be turned into a unmap_page function call which only takes an iova and
no size parameter. The iommu-driver unmaps the page pointing to that
iova and returns the size of the page unmapped. This still allows the
simple implementation for the unmap-call.

This change is no requirement for this patch-set, but if we agree on it
this patch-set should keep that direction in mind.

> The other approach which you suggest means symmetric iommu_map() and
> iommu_unmap(). It means adding a 'paddr' parameter to iommu_unmap(),
> which is easy, but maybe more concerning is the limitation that it
> incurs: users will now have to call iommu_unmap() exactly as they
> called iommu_map() beforehand. Note sure how well this will fly with
> the existing users (kvm ?) and whether we really want to enforce this
> (it doesn't mean drivers need to deal with page-size complexity. they
> are required to unmap a single page at a time, and iommu_unmap() will
> do the work for them).

It will work with KVM, that is no problem. We don't need to really
enforce the calls to be symetric. But we can define that we only give
the guarantee about what will be unmapped when the calls are symetric.
For example:

	iommu_map(  0, 0x100000);
	iommu_unmap(0, 0x100000); /* Guarantee that it will only unmap
				     the range 0-0x100000 */

whereas:

	iommu_map(  0, 0x100000);
	iommu_unmap(0,   0x1000); /* Guarantees that 0-0x1000 is
				     unmapped, but other undefined parts
				     of the address space may be
				     unmapped too, up to the whole
				     address space */

The alternative is that we implement page-splitting in the iommu_unmap
function. But that introduces complexity I am not sure we really need.
KVM for example just unmaps the whole address-space on destruction. For
the generic dma_ops this is also not required because the dma_map*
functions already have the requirement to be symetric.

> 
> Another discussion:
> 
> I think we better change iommu_ops->map() to directly take a 'size'
> (in bytes) instead of an 'order' (of pages). Most (all?) drivers just
> immediately do 'size = 0x1000UL << gfp_order', so this whole size ->
> order -> size back and forth seems redundant.

Yes, this get_order thing should be changes to size long-term.

Regards,

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-10 15:36                     ` Roedel, Joerg
  0 siblings, 0 replies; 106+ messages in thread
From: Roedel, Joerg @ 2011-10-10 15:36 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ohad,

On Mon, Oct 10, 2011 at 09:59:22AM -0400, Ohad Ben-Cohen wrote:
> > Also, the bus_set_iommu interface is now in the -next branch. Would be
> > good if you rebase the patches to that interface.
> 
> Sure. It's a little tricky though: which branch do I base this on ?
> Are you ok with me basing this on your 'next' branch ? My current
> stack depends at least on three branches of yours, so that would be
> helpful for me (and less merging conflicts for you I guess :).

The master branch is best to base your patches on for generic work. For
more specific things like omap-only changes you can use the topic
branches.

> > I think we need some care here and check pgsize for 0. A BUG_ON should
> > do.
> 
> I can add it if you prefer, but I don't think it can really happen:
> basically, it means that we chose a too small and unsupported page
> bit, which can't happen as long as we check for IS_ALIGNED(iova |
> paddr | size, iommu_min_pagesz) in the beginning of iommu_map.

It can happen when there is a bug somewhere :) So a BUG_ON will yell
then and makes debugging easier. An alternative is to use a WARN_ON and
let the map-call fail in this case.

> Ok, let's discuss the semantics of ->unmap().
> 
> There isn't a clear documentation of that API (we should probably add
> some kernel docs after we nail it down now), but judging from the
> existing users (mainly kvm) and drivers, it seems that iommu_map() and
> iommu_unmap() aren't symmetric: users rely on unmap() to return the
> actual size that was unmapped. IOMMU drivers, in turn, should check
> which page is mapped on 'iova', unmap it, and return its size.

Right, currently the map/unmap calls are not symetric. But I think they
should be to get a clean semantic. Without this requirement and multiple
page-sizes in use the iommu-code may has to unmap more address space then
requested. The user doesn't know what will be unmapped so it has to make
sure that no DMA is happening while unmap runs.

When we require the calls to be symetric we can give a guarantee that
only the requested region is unmapped and allow DMA to the untouched
part of the address-space while unmap() is running.

So when the call-places to not follow this restriction we should convert
them mid-term.

> This way iommu_unmap() becomes very simple: it just iterates through
> the region, relying on iommu_ops->unmap() to return the sizes that
> were actually unmapped (very similar to how amd's iommu_unmap_page
> works today). This also means that iommu_ops->unmap() doesn't really
> need a size/order argument and we can remove it (after all drivers
> fully migrate..).

Yes, somthing like that. Probably the iommu_ops->unmap function should
be turned into a unmap_page function call which only takes an iova and
no size parameter. The iommu-driver unmaps the page pointing to that
iova and returns the size of the page unmapped. This still allows the
simple implementation for the unmap-call.

This change is no requirement for this patch-set, but if we agree on it
this patch-set should keep that direction in mind.

> The other approach which you suggest means symmetric iommu_map() and
> iommu_unmap(). It means adding a 'paddr' parameter to iommu_unmap(),
> which is easy, but maybe more concerning is the limitation that it
> incurs: users will now have to call iommu_unmap() exactly as they
> called iommu_map() beforehand. Note sure how well this will fly with
> the existing users (kvm ?) and whether we really want to enforce this
> (it doesn't mean drivers need to deal with page-size complexity. they
> are required to unmap a single page at a time, and iommu_unmap() will
> do the work for them).

It will work with KVM, that is no problem. We don't need to really
enforce the calls to be symetric. But we can define that we only give
the guarantee about what will be unmapped when the calls are symetric.
For example:

	iommu_map(  0, 0x100000);
	iommu_unmap(0, 0x100000); /* Guarantee that it will only unmap
				     the range 0-0x100000 */

whereas:

	iommu_map(  0, 0x100000);
	iommu_unmap(0,   0x1000); /* Guarantees that 0-0x1000 is
				     unmapped, but other undefined parts
				     of the address space may be
				     unmapped too, up to the whole
				     address space */

The alternative is that we implement page-splitting in the iommu_unmap
function. But that introduces complexity I am not sure we really need.
KVM for example just unmaps the whole address-space on destruction. For
the generic dma_ops this is also not required because the dma_map*
functions already have the requirement to be symetric.

> 
> Another discussion:
> 
> I think we better change iommu_ops->map() to directly take a 'size'
> (in bytes) instead of an 'order' (of pages). Most (all?) drivers just
> immediately do 'size = 0x1000UL << gfp_order', so this whole size ->
> order -> size back and forth seems redundant.

Yes, this get_order thing should be changes to size long-term.

Regards,

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
  2011-10-10 15:36                     ` Roedel, Joerg
  (?)
@ 2011-10-10 17:02                       ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-10 17:02 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: iommu, linux-omap, Laurent Pinchart, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Stepan Moskovchenko, kvm, Hiroshi Doyu

Hi Joerg,

On Mon, Oct 10, 2011 at 5:36 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> The master branch is best to base your patches on for generic work.

Oh, great. thanks.

> It can happen when there is a bug somewhere :)

Hmm, bug ? ;)

Ok, I'll add a BUG_ON :)

> Yes, somthing like that. Probably the iommu_ops->unmap function should
> be turned into a unmap_page function call which only takes an iova and
> no size parameter. The iommu-driver unmaps the page pointing to that
> iova and returns the size of the page unmapped. This still allows the
> simple implementation for the unmap-call.

Yes, exactly. It will take some time to migrate all drivers (today we
have 4 drivers, each of which is implementing a slightly different
->unmap() semantics), but at least let's not accept any new driver
that doesn't adhere to this, otherwise it's going to be even harder
for the API to evolve.

> This change is no requirement for this patch-set, but if we agree on it
> this patch-set should keep that direction in mind.

Definitely, thanks.

> We don't need to really
> enforce the calls to be symetric. But we can define that we only give
> the guarantee about what will be unmapped when the calls are symetric.

Sounds good to me. I'll add this to the kernel doc patch (which I'll
submit after this patch set materializes), and when/if we move to
symmetric only, we will update it.

> The alternative is that we implement page-splitting in the iommu_unmap
> function. But that introduces complexity I am not sure we really need.

Yeah, me neither.

> Yes, this get_order thing should be changes to size long-term.

Good. That should be a simple change, I'll do it after this patch set.

Thanks,
Ohad.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-10 17:02                       ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-10 17:02 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: iommu, linux-omap, Laurent Pinchart, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Stepan Moskovchenko, kvm, Hiroshi Doyu

Hi Joerg,

On Mon, Oct 10, 2011 at 5:36 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> The master branch is best to base your patches on for generic work.

Oh, great. thanks.

> It can happen when there is a bug somewhere :)

Hmm, bug ? ;)

Ok, I'll add a BUG_ON :)

> Yes, somthing like that. Probably the iommu_ops->unmap function should
> be turned into a unmap_page function call which only takes an iova and
> no size parameter. The iommu-driver unmaps the page pointing to that
> iova and returns the size of the page unmapped. This still allows the
> simple implementation for the unmap-call.

Yes, exactly. It will take some time to migrate all drivers (today we
have 4 drivers, each of which is implementing a slightly different
->unmap() semantics), but at least let's not accept any new driver
that doesn't adhere to this, otherwise it's going to be even harder
for the API to evolve.

> This change is no requirement for this patch-set, but if we agree on it
> this patch-set should keep that direction in mind.

Definitely, thanks.

> We don't need to really
> enforce the calls to be symetric. But we can define that we only give
> the guarantee about what will be unmapped when the calls are symetric.

Sounds good to me. I'll add this to the kernel doc patch (which I'll
submit after this patch set materializes), and when/if we move to
symmetric only, we will update it.

> The alternative is that we implement page-splitting in the iommu_unmap
> function. But that introduces complexity I am not sure we really need.

Yeah, me neither.

> Yes, this get_order thing should be changes to size long-term.

Good. That should be a simple change, I'll do it after this patch set.

Thanks,
Ohad.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-10 17:02                       ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-10 17:02 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Joerg,

On Mon, Oct 10, 2011 at 5:36 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> The master branch is best to base your patches on for generic work.

Oh, great. thanks.

> It can happen when there is a bug somewhere :)

Hmm, bug ? ;)

Ok, I'll add a BUG_ON :)

> Yes, somthing like that. Probably the iommu_ops->unmap function should
> be turned into a unmap_page function call which only takes an iova and
> no size parameter. The iommu-driver unmaps the page pointing to that
> iova and returns the size of the page unmapped. This still allows the
> simple implementation for the unmap-call.

Yes, exactly. It will take some time to migrate all drivers (today we
have 4 drivers, each of which is implementing a slightly different
->unmap() semantics), but at least let's not accept any new driver
that doesn't adhere to this, otherwise it's going to be even harder
for the API to evolve.

> This change is no requirement for this patch-set, but if we agree on it
> this patch-set should keep that direction in mind.

Definitely, thanks.

> We don't need to really
> enforce the calls to be symetric. But we can define that we only give
> the guarantee about what will be unmapped when the calls are symetric.

Sounds good to me. I'll add this to the kernel doc patch (which I'll
submit after this patch set materializes), and when/if we move to
symmetric only, we will update it.

> The alternative is that we implement page-splitting in the iommu_unmap
> function. But that introduces complexity I am not sure we really need.

Yeah, me neither.

> Yes, this get_order thing should be changes to size long-term.

Good. That should be a simple change, I'll do it after this patch set.

Thanks,
Ohad.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
  2011-10-10 17:02                       ` Ohad Ben-Cohen
  (?)
@ 2011-10-10 22:01                         ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-10 22:01 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: iommu, linux-omap, Laurent Pinchart, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Stepan Moskovchenko, kvm, Hiroshi Doyu

> On Mon, Oct 10, 2011 at 5:36 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
>> The master branch is best to base your patches on for generic work.

It looks like the master branch is missing something like this:

>From acb316aa4bcaf383e8cb1580e30c8635e0a34369 Mon Sep 17 00:00:00 2001
From: Ohad Ben-Cohen <ohad@wizery.com>
Date: Mon, 10 Oct 2011 23:55:51 +0200
Subject: [PATCH] iommu/core: fix build issue

Fix this:

drivers/iommu/iommu.c: In function 'iommu_commit':
drivers/iommu/iommu.c:291: error: 'iommu_ops' undeclared (first use in
this function)

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
---
 drivers/iommu/iommu.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 909b0d2..a5131f1 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -288,7 +288,7 @@ EXPORT_SYMBOL_GPL(iommu_unmap);

 void iommu_commit(struct iommu_domain *domain)
 {
-	if (iommu_ops->commit)
-		iommu_ops->commit(domain);
+	if (domain->ops->commit)
+		domain->ops->commit(domain);
 }
 EXPORT_SYMBOL_GPL(iommu_commit);
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-10 22:01                         ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-10 22:01 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: iommu, linux-omap, Laurent Pinchart, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Stepan Moskovchenko, kvm, Hiroshi Doyu

> On Mon, Oct 10, 2011 at 5:36 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
>> The master branch is best to base your patches on for generic work.

It looks like the master branch is missing something like this:

>From acb316aa4bcaf383e8cb1580e30c8635e0a34369 Mon Sep 17 00:00:00 2001
From: Ohad Ben-Cohen <ohad@wizery.com>
Date: Mon, 10 Oct 2011 23:55:51 +0200
Subject: [PATCH] iommu/core: fix build issue

Fix this:

drivers/iommu/iommu.c: In function 'iommu_commit':
drivers/iommu/iommu.c:291: error: 'iommu_ops' undeclared (first use in
this function)

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
---
 drivers/iommu/iommu.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 909b0d2..a5131f1 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -288,7 +288,7 @@ EXPORT_SYMBOL_GPL(iommu_unmap);

 void iommu_commit(struct iommu_domain *domain)
 {
-	if (iommu_ops->commit)
-		iommu_ops->commit(domain);
+	if (domain->ops->commit)
+		domain->ops->commit(domain);
 }
 EXPORT_SYMBOL_GPL(iommu_commit);
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-10 22:01                         ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-10 22:01 UTC (permalink / raw)
  To: linux-arm-kernel

> On Mon, Oct 10, 2011 at 5:36 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
>> The master branch is best to base your patches on for generic work.

It looks like the master branch is missing something like this:

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
  2011-10-10 15:36                     ` Roedel, Joerg
  (?)
@ 2011-10-10 22:49                       ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-10 22:49 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: iommu, linux-omap, Laurent Pinchart, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Stepan Moskovchenko, kvm, Hiroshi Doyu

Hi Joerg,

On Mon, Oct 10, 2011 at 5:36 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> The master branch is best to base your patches on for generic work.

Done. I've revised the patches and attached the main one
below; please tell me if it looks ok, and then I'll resubmit the
entire patch set.

Thanks,
Ohad.

commit bf1d730b5f4f7631becfcd4be52693d85bfea36b
Author: Ohad Ben-Cohen <ohad@wizery.com>
Date:   Mon Oct 10 23:50:55 2011 +0200

    iommu/core: split mapping to page sizes as supported by the hardware

    When mapping a memory region, split it to page sizes as supported
    by the iommu hardware. Always prefer bigger pages, when possible,
    in order to reduce the TLB pressure.

    The logic to do that is now added to the IOMMU core, so neither the iommu
    drivers themselves nor users of the IOMMU API have to duplicate it.

    This allows a more lenient granularity of mappings; traditionally the
    IOMMU API took 'order' (of a page) as a mapping size, and directly let
    the low level iommu drivers handle the mapping, but now that the IOMMU
    core can split arbitrary memory regions into pages, we can remove this
    limitation, so users don't have to split those regions by themselves.

    Currently the supported page sizes are advertised once and they then
    remain static. That works well for OMAP and MSM but it would probably
    not fly well with intel's hardware, where the page size capabilities
    seem to have the potential to be different between several DMA
    remapping devices.

    register_iommu() currently sets a default pgsize behavior, so we can convert
    the IOMMU drivers in subsequent patches, and after all the drivers
    are converted, register_iommu will be changed (and the temporary
    default settings will be removed).

    Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted
    to send the mapping size in bytes instead of in page order.

    Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review!

    Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
    Cc: David Brown <davidb@codeaurora.org>
    Cc: David Woodhouse <dwmw2@infradead.org>
    Cc: Joerg Roedel <Joerg.Roedel@amd.com>
    Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
    Cc: KyongHo Cho <pullip.cho@samsung.com>
    Cc: Hiroshi DOYU <hdoyu@nvidia.com>
    Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
    Cc: kvm@vger.kernel.org

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 73778b7..909b0d2 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -16,6 +16,8 @@
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
  */

+#define pr_fmt(fmt)    "%s: " fmt, __func__
+
 #include <linux/device.h>
 #include <linux/kernel.h>
 #include <linux/bug.h>
@@ -47,6 +49,19 @@ int bus_set_iommu(struct bus_type *bus, struct
iommu_ops *ops)
 	if (bus->iommu_ops != NULL)
 		return -EBUSY;

+	/*
+	 * Set the default pgsize values, which retain the existing
+	 * IOMMU API behavior: drivers will be called to map
+	 * regions that are sized/aligned to order of 4KiB pages.
+	 *
+	 * This will be removed once all drivers are migrated.
+	 */
+	if (!ops->pgsize_bitmap)
+		ops->pgsize_bitmap = ~0xFFFUL;
+
+	/* find out the minimum page size only once */
+	ops->min_pagesz = 1 << __ffs(ops->pgsize_bitmap);
+
 	bus->iommu_ops = ops;

 	/* Do IOMMU specific setup for this bus-type */
@@ -157,33 +172,117 @@ int iommu_domain_has_cap(struct iommu_domain *domain,
 EXPORT_SYMBOL_GPL(iommu_domain_has_cap);

 int iommu_map(struct iommu_domain *domain, unsigned long iova,
-	      phys_addr_t paddr, int gfp_order, int prot)
+	      phys_addr_t paddr, int size, int prot)
 {
-	size_t size;
+	unsigned long orig_iova = iova;
+	int ret = 0, orig_size = size;

 	if (unlikely(domain->ops->map == NULL))
 		return -ENODEV;

-	size         = PAGE_SIZE << gfp_order;
+	/*
+	 * both the virtual address and the physical one, as well as
+	 * the size of the mapping, must be aligned (at least) to the
+	 * size of the smallest page supported by the hardware
+	 */
+	if (!IS_ALIGNED(iova | paddr | size, domain->ops->min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%x min_pagesz "
+			"0x%x\n", iova, (unsigned long)paddr,
+			size, domain->ops->min_pagesz);
+		return -EINVAL;
+	}
+
+	pr_debug("map: iova 0x%lx pa 0x%lx size 0x%x\n", iova,
+				(unsigned long)paddr, size);
+
+	while (size) {
+		unsigned long pgsize, addr_merge = iova | paddr;
+		unsigned int pgsize_idx;
+
+		/* Max page size that still fits into 'size' */
+		pgsize_idx = __fls(size);
+
+		/* need to consider alignment requirements ? */
+		if (likely(addr_merge)) {
+			/* Max page size allowed by both iova and paddr */
+			unsigned int align_pgsize_idx = __ffs(addr_merge);
+
+			pgsize_idx = min(pgsize_idx, align_pgsize_idx);
+		}
+
+		/* build a mask of acceptable page sizes */
+		pgsize = (1UL << (pgsize_idx + 1)) - 1;
+
+		/* throw away page sizes not supported by the hardware */
+		pgsize &= domain->ops->pgsize_bitmap;
+
+		/* make sure we're still sane */
+		BUG_ON(!pgsize);

-	BUG_ON(!IS_ALIGNED(iova | paddr, size));
+		/* pick the biggest page */
+		pgsize_idx = __fls(pgsize);
+		pgsize = 1UL << pgsize_idx;

-	return domain->ops->map(domain, iova, paddr, gfp_order, prot);
+		/* convert index to page order */
+		pgsize_idx -= PAGE_SHIFT;
+
+		pr_debug("mapping: iova 0x%lx pa 0x%lx order %u\n", iova,
+					(unsigned long)paddr, pgsize_idx);
+
+		ret = domain->ops->map(domain, iova, paddr, pgsize_idx, prot);
+		if (ret)
+			break;
+
+		iova += pgsize;
+		paddr += pgsize;
+		size -= pgsize;
+	}
+
+	/* unroll mapping in case something went wrong */
+	if (ret)
+		iommu_unmap(domain, orig_iova, orig_size);
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_map);

-int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
+int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int size)
 {
-	size_t size;
+	int order, unmapped_size, unmapped_order, total_unmapped = 0;

 	if (unlikely(domain->ops->unmap == NULL))
 		return -ENODEV;

-	size         = PAGE_SIZE << gfp_order;
+	/*
+	 * The virtual address, as well as the size of the mapping, must be
+	 * aligned (at least) to the size of the smallest page supported
+	 * by the hardware
+	 */
+	if (!IS_ALIGNED(iova | size, domain->ops->min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx size 0x%x min_pagesz 0x%x\n",
+			iova, size, domain->ops->min_pagesz);
+		return -EINVAL;
+	}
+
+	pr_debug("unmap this: iova 0x%lx size 0x%x\n", iova, size);
+
+	while (size > total_unmapped) {
+		order = get_order(size - total_unmapped);
+
+		unmapped_order = domain->ops->unmap(domain, iova, order);
+		if (unmapped_order < 0)
+			break;
+
+		pr_debug("unmapped: iova 0x%lx order %d\n", iova,
+							unmapped_order);
+
+		unmapped_size = 0x1000UL << unmapped_order;

-	BUG_ON(!IS_ALIGNED(iova, size));
+		iova += unmapped_size;
+		total_unmapped += unmapped_size;
+	}

-	return domain->ops->unmap(domain, iova, gfp_order);
+	return total_unmapped;
 }
 EXPORT_SYMBOL_GPL(iommu_unmap);

diff --git a/drivers/iommu/omap-iovmm.c b/drivers/iommu/omap-iovmm.c
index e8fdb88..4a8f761 100644
--- a/drivers/iommu/omap-iovmm.c
+++ b/drivers/iommu/omap-iovmm.c
@@ -409,7 +409,6 @@ static int map_iovm_area(struct iommu_domain
*domain, struct iovm_struct *new,
 	unsigned int i, j;
 	struct scatterlist *sg;
 	u32 da = new->da_start;
-	int order;

 	if (!domain || !sgt)
 		return -EINVAL;
@@ -428,12 +427,10 @@ static int map_iovm_area(struct iommu_domain
*domain, struct iovm_struct *new,
 		if (bytes_to_iopgsz(bytes) < 0)
 			goto err_out;

-		order = get_order(bytes);
-
 		pr_debug("%s: [%d] %08x %08x(%x)\n", __func__,
 			 i, da, pa, bytes);

-		err = iommu_map(domain, da, pa, order, flags);
+		err = iommu_map(domain, da, pa, bytes, flags);
 		if (err)
 			goto err_out;

@@ -448,10 +445,9 @@ err_out:
 		size_t bytes;

 		bytes = sg->length + sg->offset;
-		order = get_order(bytes);

 		/* ignore failures.. we're already handling one */
-		iommu_unmap(domain, da, order);
+		iommu_unmap(domain, da, bytes);

 		da += bytes;
 	}
@@ -474,13 +470,11 @@ static void unmap_iovm_area(struct iommu_domain
*domain, struct omap_iommu *obj,
 	start = area->da_start;
 	for_each_sg(sgt->sgl, sg, sgt->nents, i) {
 		size_t bytes;
-		int order;

 		bytes = sg->length + sg->offset;
-		order = get_order(bytes);

-		err = iommu_unmap(domain, start, order);
-		if (err < 0)
+		err = iommu_unmap(domain, start, bytes);
+		if (err < bytes)
 			break;

 		dev_dbg(obj->dev, "%s: unmap %08x(%x) %08x\n",
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 710291f..a10a86c 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -48,6 +48,22 @@ struct iommu_domain {

 #ifdef CONFIG_IOMMU_API

+/**
+ * struct iommu_ops - iommu ops and capabilities
+ * @domain_init: init iommu domain
+ * @domain_destroy: destroy iommu domain
+ * @attach_dev: attach device to an iommu domain
+ * @detach_dev: detach device from an iommu domain
+ * @map: map a physically contiguous memory region to an iommu domain
+ * @unmap: unmap a physically contiguous memory region from an iommu domain
+ * @iova_to_phys: translate iova to physical address
+ * @domain_has_cap: domain capabilities query
+ * @commit: commit iommu domain
+ * @pgsize_bitmap: bitmap of supported page sizes
+ * @min_pagesz: smallest page size supported. note: this member is private
+ *		to the IOMMU core, and maintained only for efficiency sake;
+ *		drivers don't need to set it.
+ */
 struct iommu_ops {
 	int (*domain_init)(struct iommu_domain *domain);
 	void (*domain_destroy)(struct iommu_domain *domain);
@@ -62,6 +78,8 @@ struct iommu_ops {
 	int (*domain_has_cap)(struct iommu_domain *domain,
 			      unsigned long cap);
 	void (*commit)(struct iommu_domain *domain);
+	unsigned long pgsize_bitmap;
+	unsigned int min_pagesz;
 };

 extern int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops);
@@ -73,9 +91,9 @@ extern int iommu_attach_device(struct iommu_domain *domain,
 extern void iommu_detach_device(struct iommu_domain *domain,
 				struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
-		     phys_addr_t paddr, int gfp_order, int prot);
+		     phys_addr_t paddr, int size, int prot);
 extern int iommu_unmap(struct iommu_domain *domain, unsigned long iova,
-		       int gfp_order);
+		       int size);
 extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
 				      unsigned long iova);
 extern int iommu_domain_has_cap(struct iommu_domain *domain,
diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index ca409be..26b3199 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -111,7 +111,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct
kvm_memory_slot *slot)

 		/* Map into IO address space */
 		r = iommu_map(domain, gfn_to_gpa(gfn), pfn_to_hpa(pfn),
-			      get_order(page_size), flags);
+			      page_size, flags);
 		if (r) {
 			printk(KERN_ERR "kvm_iommu_map_address:"
 			       "iommu failed to map pfn=%llx\n", pfn);
@@ -292,15 +292,15 @@ static void kvm_iommu_put_pages(struct kvm *kvm,

 	while (gfn < end_gfn) {
 		unsigned long unmap_pages;
-		int order;
+		int size;

 		/* Get physical address */
 		phys = iommu_iova_to_phys(domain, gfn_to_gpa(gfn));
 		pfn  = phys >> PAGE_SHIFT;

 		/* Unmap address from IO address space */
-		order       = iommu_unmap(domain, gfn_to_gpa(gfn), 0);
-		unmap_pages = 1ULL << order;
+		size       = iommu_unmap(domain, gfn_to_gpa(gfn), PAGE_SIZE);
+		unmap_pages = 1ULL << get_order(size);

 		/* Unpin all pages we just unmapped to not leak any memory */
 		kvm_unpin_pages(kvm, pfn, unmap_pages);

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-10 22:49                       ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-10 22:49 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: iommu, linux-omap, Laurent Pinchart, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Stepan Moskovchenko, kvm, Hiroshi Doyu

Hi Joerg,

On Mon, Oct 10, 2011 at 5:36 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> The master branch is best to base your patches on for generic work.

Done. I've revised the patches and attached the main one
below; please tell me if it looks ok, and then I'll resubmit the
entire patch set.

Thanks,
Ohad.

commit bf1d730b5f4f7631becfcd4be52693d85bfea36b
Author: Ohad Ben-Cohen <ohad@wizery.com>
Date:   Mon Oct 10 23:50:55 2011 +0200

    iommu/core: split mapping to page sizes as supported by the hardware

    When mapping a memory region, split it to page sizes as supported
    by the iommu hardware. Always prefer bigger pages, when possible,
    in order to reduce the TLB pressure.

    The logic to do that is now added to the IOMMU core, so neither the iommu
    drivers themselves nor users of the IOMMU API have to duplicate it.

    This allows a more lenient granularity of mappings; traditionally the
    IOMMU API took 'order' (of a page) as a mapping size, and directly let
    the low level iommu drivers handle the mapping, but now that the IOMMU
    core can split arbitrary memory regions into pages, we can remove this
    limitation, so users don't have to split those regions by themselves.

    Currently the supported page sizes are advertised once and they then
    remain static. That works well for OMAP and MSM but it would probably
    not fly well with intel's hardware, where the page size capabilities
    seem to have the potential to be different between several DMA
    remapping devices.

    register_iommu() currently sets a default pgsize behavior, so we can convert
    the IOMMU drivers in subsequent patches, and after all the drivers
    are converted, register_iommu will be changed (and the temporary
    default settings will be removed).

    Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted
    to send the mapping size in bytes instead of in page order.

    Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review!

    Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
    Cc: David Brown <davidb@codeaurora.org>
    Cc: David Woodhouse <dwmw2@infradead.org>
    Cc: Joerg Roedel <Joerg.Roedel@amd.com>
    Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
    Cc: KyongHo Cho <pullip.cho@samsung.com>
    Cc: Hiroshi DOYU <hdoyu@nvidia.com>
    Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
    Cc: kvm@vger.kernel.org

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 73778b7..909b0d2 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -16,6 +16,8 @@
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
  */

+#define pr_fmt(fmt)    "%s: " fmt, __func__
+
 #include <linux/device.h>
 #include <linux/kernel.h>
 #include <linux/bug.h>
@@ -47,6 +49,19 @@ int bus_set_iommu(struct bus_type *bus, struct
iommu_ops *ops)
 	if (bus->iommu_ops != NULL)
 		return -EBUSY;

+	/*
+	 * Set the default pgsize values, which retain the existing
+	 * IOMMU API behavior: drivers will be called to map
+	 * regions that are sized/aligned to order of 4KiB pages.
+	 *
+	 * This will be removed once all drivers are migrated.
+	 */
+	if (!ops->pgsize_bitmap)
+		ops->pgsize_bitmap = ~0xFFFUL;
+
+	/* find out the minimum page size only once */
+	ops->min_pagesz = 1 << __ffs(ops->pgsize_bitmap);
+
 	bus->iommu_ops = ops;

 	/* Do IOMMU specific setup for this bus-type */
@@ -157,33 +172,117 @@ int iommu_domain_has_cap(struct iommu_domain *domain,
 EXPORT_SYMBOL_GPL(iommu_domain_has_cap);

 int iommu_map(struct iommu_domain *domain, unsigned long iova,
-	      phys_addr_t paddr, int gfp_order, int prot)
+	      phys_addr_t paddr, int size, int prot)
 {
-	size_t size;
+	unsigned long orig_iova = iova;
+	int ret = 0, orig_size = size;

 	if (unlikely(domain->ops->map == NULL))
 		return -ENODEV;

-	size         = PAGE_SIZE << gfp_order;
+	/*
+	 * both the virtual address and the physical one, as well as
+	 * the size of the mapping, must be aligned (at least) to the
+	 * size of the smallest page supported by the hardware
+	 */
+	if (!IS_ALIGNED(iova | paddr | size, domain->ops->min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%x min_pagesz "
+			"0x%x\n", iova, (unsigned long)paddr,
+			size, domain->ops->min_pagesz);
+		return -EINVAL;
+	}
+
+	pr_debug("map: iova 0x%lx pa 0x%lx size 0x%x\n", iova,
+				(unsigned long)paddr, size);
+
+	while (size) {
+		unsigned long pgsize, addr_merge = iova | paddr;
+		unsigned int pgsize_idx;
+
+		/* Max page size that still fits into 'size' */
+		pgsize_idx = __fls(size);
+
+		/* need to consider alignment requirements ? */
+		if (likely(addr_merge)) {
+			/* Max page size allowed by both iova and paddr */
+			unsigned int align_pgsize_idx = __ffs(addr_merge);
+
+			pgsize_idx = min(pgsize_idx, align_pgsize_idx);
+		}
+
+		/* build a mask of acceptable page sizes */
+		pgsize = (1UL << (pgsize_idx + 1)) - 1;
+
+		/* throw away page sizes not supported by the hardware */
+		pgsize &= domain->ops->pgsize_bitmap;
+
+		/* make sure we're still sane */
+		BUG_ON(!pgsize);

-	BUG_ON(!IS_ALIGNED(iova | paddr, size));
+		/* pick the biggest page */
+		pgsize_idx = __fls(pgsize);
+		pgsize = 1UL << pgsize_idx;

-	return domain->ops->map(domain, iova, paddr, gfp_order, prot);
+		/* convert index to page order */
+		pgsize_idx -= PAGE_SHIFT;
+
+		pr_debug("mapping: iova 0x%lx pa 0x%lx order %u\n", iova,
+					(unsigned long)paddr, pgsize_idx);
+
+		ret = domain->ops->map(domain, iova, paddr, pgsize_idx, prot);
+		if (ret)
+			break;
+
+		iova += pgsize;
+		paddr += pgsize;
+		size -= pgsize;
+	}
+
+	/* unroll mapping in case something went wrong */
+	if (ret)
+		iommu_unmap(domain, orig_iova, orig_size);
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_map);

-int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
+int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int size)
 {
-	size_t size;
+	int order, unmapped_size, unmapped_order, total_unmapped = 0;

 	if (unlikely(domain->ops->unmap == NULL))
 		return -ENODEV;

-	size         = PAGE_SIZE << gfp_order;
+	/*
+	 * The virtual address, as well as the size of the mapping, must be
+	 * aligned (at least) to the size of the smallest page supported
+	 * by the hardware
+	 */
+	if (!IS_ALIGNED(iova | size, domain->ops->min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx size 0x%x min_pagesz 0x%x\n",
+			iova, size, domain->ops->min_pagesz);
+		return -EINVAL;
+	}
+
+	pr_debug("unmap this: iova 0x%lx size 0x%x\n", iova, size);
+
+	while (size > total_unmapped) {
+		order = get_order(size - total_unmapped);
+
+		unmapped_order = domain->ops->unmap(domain, iova, order);
+		if (unmapped_order < 0)
+			break;
+
+		pr_debug("unmapped: iova 0x%lx order %d\n", iova,
+							unmapped_order);
+
+		unmapped_size = 0x1000UL << unmapped_order;

-	BUG_ON(!IS_ALIGNED(iova, size));
+		iova += unmapped_size;
+		total_unmapped += unmapped_size;
+	}

-	return domain->ops->unmap(domain, iova, gfp_order);
+	return total_unmapped;
 }
 EXPORT_SYMBOL_GPL(iommu_unmap);

diff --git a/drivers/iommu/omap-iovmm.c b/drivers/iommu/omap-iovmm.c
index e8fdb88..4a8f761 100644
--- a/drivers/iommu/omap-iovmm.c
+++ b/drivers/iommu/omap-iovmm.c
@@ -409,7 +409,6 @@ static int map_iovm_area(struct iommu_domain
*domain, struct iovm_struct *new,
 	unsigned int i, j;
 	struct scatterlist *sg;
 	u32 da = new->da_start;
-	int order;

 	if (!domain || !sgt)
 		return -EINVAL;
@@ -428,12 +427,10 @@ static int map_iovm_area(struct iommu_domain
*domain, struct iovm_struct *new,
 		if (bytes_to_iopgsz(bytes) < 0)
 			goto err_out;

-		order = get_order(bytes);
-
 		pr_debug("%s: [%d] %08x %08x(%x)\n", __func__,
 			 i, da, pa, bytes);

-		err = iommu_map(domain, da, pa, order, flags);
+		err = iommu_map(domain, da, pa, bytes, flags);
 		if (err)
 			goto err_out;

@@ -448,10 +445,9 @@ err_out:
 		size_t bytes;

 		bytes = sg->length + sg->offset;
-		order = get_order(bytes);

 		/* ignore failures.. we're already handling one */
-		iommu_unmap(domain, da, order);
+		iommu_unmap(domain, da, bytes);

 		da += bytes;
 	}
@@ -474,13 +470,11 @@ static void unmap_iovm_area(struct iommu_domain
*domain, struct omap_iommu *obj,
 	start = area->da_start;
 	for_each_sg(sgt->sgl, sg, sgt->nents, i) {
 		size_t bytes;
-		int order;

 		bytes = sg->length + sg->offset;
-		order = get_order(bytes);

-		err = iommu_unmap(domain, start, order);
-		if (err < 0)
+		err = iommu_unmap(domain, start, bytes);
+		if (err < bytes)
 			break;

 		dev_dbg(obj->dev, "%s: unmap %08x(%x) %08x\n",
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 710291f..a10a86c 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -48,6 +48,22 @@ struct iommu_domain {

 #ifdef CONFIG_IOMMU_API

+/**
+ * struct iommu_ops - iommu ops and capabilities
+ * @domain_init: init iommu domain
+ * @domain_destroy: destroy iommu domain
+ * @attach_dev: attach device to an iommu domain
+ * @detach_dev: detach device from an iommu domain
+ * @map: map a physically contiguous memory region to an iommu domain
+ * @unmap: unmap a physically contiguous memory region from an iommu domain
+ * @iova_to_phys: translate iova to physical address
+ * @domain_has_cap: domain capabilities query
+ * @commit: commit iommu domain
+ * @pgsize_bitmap: bitmap of supported page sizes
+ * @min_pagesz: smallest page size supported. note: this member is private
+ *		to the IOMMU core, and maintained only for efficiency sake;
+ *		drivers don't need to set it.
+ */
 struct iommu_ops {
 	int (*domain_init)(struct iommu_domain *domain);
 	void (*domain_destroy)(struct iommu_domain *domain);
@@ -62,6 +78,8 @@ struct iommu_ops {
 	int (*domain_has_cap)(struct iommu_domain *domain,
 			      unsigned long cap);
 	void (*commit)(struct iommu_domain *domain);
+	unsigned long pgsize_bitmap;
+	unsigned int min_pagesz;
 };

 extern int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops);
@@ -73,9 +91,9 @@ extern int iommu_attach_device(struct iommu_domain *domain,
 extern void iommu_detach_device(struct iommu_domain *domain,
 				struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
-		     phys_addr_t paddr, int gfp_order, int prot);
+		     phys_addr_t paddr, int size, int prot);
 extern int iommu_unmap(struct iommu_domain *domain, unsigned long iova,
-		       int gfp_order);
+		       int size);
 extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
 				      unsigned long iova);
 extern int iommu_domain_has_cap(struct iommu_domain *domain,
diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index ca409be..26b3199 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -111,7 +111,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct
kvm_memory_slot *slot)

 		/* Map into IO address space */
 		r = iommu_map(domain, gfn_to_gpa(gfn), pfn_to_hpa(pfn),
-			      get_order(page_size), flags);
+			      page_size, flags);
 		if (r) {
 			printk(KERN_ERR "kvm_iommu_map_address:"
 			       "iommu failed to map pfn=%llx\n", pfn);
@@ -292,15 +292,15 @@ static void kvm_iommu_put_pages(struct kvm *kvm,

 	while (gfn < end_gfn) {
 		unsigned long unmap_pages;
-		int order;
+		int size;

 		/* Get physical address */
 		phys = iommu_iova_to_phys(domain, gfn_to_gpa(gfn));
 		pfn  = phys >> PAGE_SHIFT;

 		/* Unmap address from IO address space */
-		order       = iommu_unmap(domain, gfn_to_gpa(gfn), 0);
-		unmap_pages = 1ULL << order;
+		size       = iommu_unmap(domain, gfn_to_gpa(gfn), PAGE_SIZE);
+		unmap_pages = 1ULL << get_order(size);

 		/* Unpin all pages we just unmapped to not leak any memory */
 		kvm_unpin_pages(kvm, pfn, unmap_pages);

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-10 22:49                       ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-10 22:49 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Joerg,

On Mon, Oct 10, 2011 at 5:36 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> The master branch is best to base your patches on for generic work.

Done. I've revised the patches and attached the main one
below; please tell me if it looks ok, and then I'll resubmit the
entire patch set.

Thanks,
Ohad.

commit bf1d730b5f4f7631becfcd4be52693d85bfea36b
Author: Ohad Ben-Cohen <ohad@wizery.com>
Date:   Mon Oct 10 23:50:55 2011 +0200

    iommu/core: split mapping to page sizes as supported by the hardware

    When mapping a memory region, split it to page sizes as supported
    by the iommu hardware. Always prefer bigger pages, when possible,
    in order to reduce the TLB pressure.

    The logic to do that is now added to the IOMMU core, so neither the iommu
    drivers themselves nor users of the IOMMU API have to duplicate it.

    This allows a more lenient granularity of mappings; traditionally the
    IOMMU API took 'order' (of a page) as a mapping size, and directly let
    the low level iommu drivers handle the mapping, but now that the IOMMU
    core can split arbitrary memory regions into pages, we can remove this
    limitation, so users don't have to split those regions by themselves.

    Currently the supported page sizes are advertised once and they then
    remain static. That works well for OMAP and MSM but it would probably
    not fly well with intel's hardware, where the page size capabilities
    seem to have the potential to be different between several DMA
    remapping devices.

    register_iommu() currently sets a default pgsize behavior, so we can convert
    the IOMMU drivers in subsequent patches, and after all the drivers
    are converted, register_iommu will be changed (and the temporary
    default settings will be removed).

    Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted
    to send the mapping size in bytes instead of in page order.

    Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review!

    Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
    Cc: David Brown <davidb@codeaurora.org>
    Cc: David Woodhouse <dwmw2@infradead.org>
    Cc: Joerg Roedel <Joerg.Roedel@amd.com>
    Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
    Cc: KyongHo Cho <pullip.cho@samsung.com>
    Cc: Hiroshi DOYU <hdoyu@nvidia.com>
    Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
    Cc: kvm at vger.kernel.org

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 73778b7..909b0d2 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -16,6 +16,8 @@
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
  */

+#define pr_fmt(fmt)    "%s: " fmt, __func__
+
 #include <linux/device.h>
 #include <linux/kernel.h>
 #include <linux/bug.h>
@@ -47,6 +49,19 @@ int bus_set_iommu(struct bus_type *bus, struct
iommu_ops *ops)
 	if (bus->iommu_ops != NULL)
 		return -EBUSY;

+	/*
+	 * Set the default pgsize values, which retain the existing
+	 * IOMMU API behavior: drivers will be called to map
+	 * regions that are sized/aligned to order of 4KiB pages.
+	 *
+	 * This will be removed once all drivers are migrated.
+	 */
+	if (!ops->pgsize_bitmap)
+		ops->pgsize_bitmap = ~0xFFFUL;
+
+	/* find out the minimum page size only once */
+	ops->min_pagesz = 1 << __ffs(ops->pgsize_bitmap);
+
 	bus->iommu_ops = ops;

 	/* Do IOMMU specific setup for this bus-type */
@@ -157,33 +172,117 @@ int iommu_domain_has_cap(struct iommu_domain *domain,
 EXPORT_SYMBOL_GPL(iommu_domain_has_cap);

 int iommu_map(struct iommu_domain *domain, unsigned long iova,
-	      phys_addr_t paddr, int gfp_order, int prot)
+	      phys_addr_t paddr, int size, int prot)
 {
-	size_t size;
+	unsigned long orig_iova = iova;
+	int ret = 0, orig_size = size;

 	if (unlikely(domain->ops->map == NULL))
 		return -ENODEV;

-	size         = PAGE_SIZE << gfp_order;
+	/*
+	 * both the virtual address and the physical one, as well as
+	 * the size of the mapping, must be aligned (at least) to the
+	 * size of the smallest page supported by the hardware
+	 */
+	if (!IS_ALIGNED(iova | paddr | size, domain->ops->min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%x min_pagesz "
+			"0x%x\n", iova, (unsigned long)paddr,
+			size, domain->ops->min_pagesz);
+		return -EINVAL;
+	}
+
+	pr_debug("map: iova 0x%lx pa 0x%lx size 0x%x\n", iova,
+				(unsigned long)paddr, size);
+
+	while (size) {
+		unsigned long pgsize, addr_merge = iova | paddr;
+		unsigned int pgsize_idx;
+
+		/* Max page size that still fits into 'size' */
+		pgsize_idx = __fls(size);
+
+		/* need to consider alignment requirements ? */
+		if (likely(addr_merge)) {
+			/* Max page size allowed by both iova and paddr */
+			unsigned int align_pgsize_idx = __ffs(addr_merge);
+
+			pgsize_idx = min(pgsize_idx, align_pgsize_idx);
+		}
+
+		/* build a mask of acceptable page sizes */
+		pgsize = (1UL << (pgsize_idx + 1)) - 1;
+
+		/* throw away page sizes not supported by the hardware */
+		pgsize &= domain->ops->pgsize_bitmap;
+
+		/* make sure we're still sane */
+		BUG_ON(!pgsize);

-	BUG_ON(!IS_ALIGNED(iova | paddr, size));
+		/* pick the biggest page */
+		pgsize_idx = __fls(pgsize);
+		pgsize = 1UL << pgsize_idx;

-	return domain->ops->map(domain, iova, paddr, gfp_order, prot);
+		/* convert index to page order */
+		pgsize_idx -= PAGE_SHIFT;
+
+		pr_debug("mapping: iova 0x%lx pa 0x%lx order %u\n", iova,
+					(unsigned long)paddr, pgsize_idx);
+
+		ret = domain->ops->map(domain, iova, paddr, pgsize_idx, prot);
+		if (ret)
+			break;
+
+		iova += pgsize;
+		paddr += pgsize;
+		size -= pgsize;
+	}
+
+	/* unroll mapping in case something went wrong */
+	if (ret)
+		iommu_unmap(domain, orig_iova, orig_size);
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_map);

-int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
+int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int size)
 {
-	size_t size;
+	int order, unmapped_size, unmapped_order, total_unmapped = 0;

 	if (unlikely(domain->ops->unmap == NULL))
 		return -ENODEV;

-	size         = PAGE_SIZE << gfp_order;
+	/*
+	 * The virtual address, as well as the size of the mapping, must be
+	 * aligned (at least) to the size of the smallest page supported
+	 * by the hardware
+	 */
+	if (!IS_ALIGNED(iova | size, domain->ops->min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx size 0x%x min_pagesz 0x%x\n",
+			iova, size, domain->ops->min_pagesz);
+		return -EINVAL;
+	}
+
+	pr_debug("unmap this: iova 0x%lx size 0x%x\n", iova, size);
+
+	while (size > total_unmapped) {
+		order = get_order(size - total_unmapped);
+
+		unmapped_order = domain->ops->unmap(domain, iova, order);
+		if (unmapped_order < 0)
+			break;
+
+		pr_debug("unmapped: iova 0x%lx order %d\n", iova,
+							unmapped_order);
+
+		unmapped_size = 0x1000UL << unmapped_order;

-	BUG_ON(!IS_ALIGNED(iova, size));
+		iova += unmapped_size;
+		total_unmapped += unmapped_size;
+	}

-	return domain->ops->unmap(domain, iova, gfp_order);
+	return total_unmapped;
 }
 EXPORT_SYMBOL_GPL(iommu_unmap);

diff --git a/drivers/iommu/omap-iovmm.c b/drivers/iommu/omap-iovmm.c
index e8fdb88..4a8f761 100644
--- a/drivers/iommu/omap-iovmm.c
+++ b/drivers/iommu/omap-iovmm.c
@@ -409,7 +409,6 @@ static int map_iovm_area(struct iommu_domain
*domain, struct iovm_struct *new,
 	unsigned int i, j;
 	struct scatterlist *sg;
 	u32 da = new->da_start;
-	int order;

 	if (!domain || !sgt)
 		return -EINVAL;
@@ -428,12 +427,10 @@ static int map_iovm_area(struct iommu_domain
*domain, struct iovm_struct *new,
 		if (bytes_to_iopgsz(bytes) < 0)
 			goto err_out;

-		order = get_order(bytes);
-
 		pr_debug("%s: [%d] %08x %08x(%x)\n", __func__,
 			 i, da, pa, bytes);

-		err = iommu_map(domain, da, pa, order, flags);
+		err = iommu_map(domain, da, pa, bytes, flags);
 		if (err)
 			goto err_out;

@@ -448,10 +445,9 @@ err_out:
 		size_t bytes;

 		bytes = sg->length + sg->offset;
-		order = get_order(bytes);

 		/* ignore failures.. we're already handling one */
-		iommu_unmap(domain, da, order);
+		iommu_unmap(domain, da, bytes);

 		da += bytes;
 	}
@@ -474,13 +470,11 @@ static void unmap_iovm_area(struct iommu_domain
*domain, struct omap_iommu *obj,
 	start = area->da_start;
 	for_each_sg(sgt->sgl, sg, sgt->nents, i) {
 		size_t bytes;
-		int order;

 		bytes = sg->length + sg->offset;
-		order = get_order(bytes);

-		err = iommu_unmap(domain, start, order);
-		if (err < 0)
+		err = iommu_unmap(domain, start, bytes);
+		if (err < bytes)
 			break;

 		dev_dbg(obj->dev, "%s: unmap %08x(%x) %08x\n",
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 710291f..a10a86c 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -48,6 +48,22 @@ struct iommu_domain {

 #ifdef CONFIG_IOMMU_API

+/**
+ * struct iommu_ops - iommu ops and capabilities
+ * @domain_init: init iommu domain
+ * @domain_destroy: destroy iommu domain
+ * @attach_dev: attach device to an iommu domain
+ * @detach_dev: detach device from an iommu domain
+ * @map: map a physically contiguous memory region to an iommu domain
+ * @unmap: unmap a physically contiguous memory region from an iommu domain
+ * @iova_to_phys: translate iova to physical address
+ * @domain_has_cap: domain capabilities query
+ * @commit: commit iommu domain
+ * @pgsize_bitmap: bitmap of supported page sizes
+ * @min_pagesz: smallest page size supported. note: this member is private
+ *		to the IOMMU core, and maintained only for efficiency sake;
+ *		drivers don't need to set it.
+ */
 struct iommu_ops {
 	int (*domain_init)(struct iommu_domain *domain);
 	void (*domain_destroy)(struct iommu_domain *domain);
@@ -62,6 +78,8 @@ struct iommu_ops {
 	int (*domain_has_cap)(struct iommu_domain *domain,
 			      unsigned long cap);
 	void (*commit)(struct iommu_domain *domain);
+	unsigned long pgsize_bitmap;
+	unsigned int min_pagesz;
 };

 extern int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops);
@@ -73,9 +91,9 @@ extern int iommu_attach_device(struct iommu_domain *domain,
 extern void iommu_detach_device(struct iommu_domain *domain,
 				struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
-		     phys_addr_t paddr, int gfp_order, int prot);
+		     phys_addr_t paddr, int size, int prot);
 extern int iommu_unmap(struct iommu_domain *domain, unsigned long iova,
-		       int gfp_order);
+		       int size);
 extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
 				      unsigned long iova);
 extern int iommu_domain_has_cap(struct iommu_domain *domain,
diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index ca409be..26b3199 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -111,7 +111,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct
kvm_memory_slot *slot)

 		/* Map into IO address space */
 		r = iommu_map(domain, gfn_to_gpa(gfn), pfn_to_hpa(pfn),
-			      get_order(page_size), flags);
+			      page_size, flags);
 		if (r) {
 			printk(KERN_ERR "kvm_iommu_map_address:"
 			       "iommu failed to map pfn=%llx\n", pfn);
@@ -292,15 +292,15 @@ static void kvm_iommu_put_pages(struct kvm *kvm,

 	while (gfn < end_gfn) {
 		unsigned long unmap_pages;
-		int order;
+		int size;

 		/* Get physical address */
 		phys = iommu_iova_to_phys(domain, gfn_to_gpa(gfn));
 		pfn  = phys >> PAGE_SHIFT;

 		/* Unmap address from IO address space */
-		order       = iommu_unmap(domain, gfn_to_gpa(gfn), 0);
-		unmap_pages = 1ULL << order;
+		size       = iommu_unmap(domain, gfn_to_gpa(gfn), PAGE_SIZE);
+		unmap_pages = 1ULL << get_order(size);

 		/* Unpin all pages we just unmapped to not leak any memory */
 		kvm_unpin_pages(kvm, pfn, unmap_pages);

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
  2011-10-10 17:02                       ` Ohad Ben-Cohen
  (?)
@ 2011-10-11 10:19                         ` Roedel, Joerg
  -1 siblings, 0 replies; 106+ messages in thread
From: Roedel, Joerg @ 2011-10-11 10:19 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: iommu, linux-omap, Laurent Pinchart, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Stepan Moskovchenko, kvm, Hiroshi Doyu

Hi Ohad,

On Mon, Oct 10, 2011 at 01:02:46PM -0400, Ohad Ben-Cohen wrote:
> On Mon, Oct 10, 2011 at 5:36 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> > Yes, somthing like that. Probably the iommu_ops->unmap function should
> > be turned into a unmap_page function call which only takes an iova and
> > no size parameter. The iommu-driver unmaps the page pointing to that
> > iova and returns the size of the page unmapped. This still allows the
> > simple implementation for the unmap-call.
> 
> Yes, exactly. It will take some time to migrate all drivers (today we
> have 4 drivers, each of which is implementing a slightly different
> ->unmap() semantics), but at least let's not accept any new driver
> that doesn't adhere to this, otherwise it's going to be even harder
> for the API to evolve.

Agreed. We should change the existing drivers one-by-one.

> > We don't need to really enforce the calls to be symetric. But we can
> > define that we only give the guarantee about what will be unmapped
> > when the calls are symetric.
> 
> Sounds good to me. I'll add this to the kernel doc patch (which I'll
> submit after this patch set materializes), and when/if we move to
> symmetric only, we will update it.

Great. So if no one disagrees with this direction (this means all the
other people on Cc :) ) it is the way to.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-11 10:19                         ` Roedel, Joerg
  0 siblings, 0 replies; 106+ messages in thread
From: Roedel, Joerg @ 2011-10-11 10:19 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: iommu, linux-omap, Laurent Pinchart, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Stepan Moskovchenko, kvm, Hiroshi Doyu

Hi Ohad,

On Mon, Oct 10, 2011 at 01:02:46PM -0400, Ohad Ben-Cohen wrote:
> On Mon, Oct 10, 2011 at 5:36 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> > Yes, somthing like that. Probably the iommu_ops->unmap function should
> > be turned into a unmap_page function call which only takes an iova and
> > no size parameter. The iommu-driver unmaps the page pointing to that
> > iova and returns the size of the page unmapped. This still allows the
> > simple implementation for the unmap-call.
> 
> Yes, exactly. It will take some time to migrate all drivers (today we
> have 4 drivers, each of which is implementing a slightly different
> ->unmap() semantics), but at least let's not accept any new driver
> that doesn't adhere to this, otherwise it's going to be even harder
> for the API to evolve.

Agreed. We should change the existing drivers one-by-one.

> > We don't need to really enforce the calls to be symetric. But we can
> > define that we only give the guarantee about what will be unmapped
> > when the calls are symetric.
> 
> Sounds good to me. I'll add this to the kernel doc patch (which I'll
> submit after this patch set materializes), and when/if we move to
> symmetric only, we will update it.

Great. So if no one disagrees with this direction (this means all the
other people on Cc :) ) it is the way to.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-11 10:19                         ` Roedel, Joerg
  0 siblings, 0 replies; 106+ messages in thread
From: Roedel, Joerg @ 2011-10-11 10:19 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ohad,

On Mon, Oct 10, 2011 at 01:02:46PM -0400, Ohad Ben-Cohen wrote:
> On Mon, Oct 10, 2011 at 5:36 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> > Yes, somthing like that. Probably the iommu_ops->unmap function should
> > be turned into a unmap_page function call which only takes an iova and
> > no size parameter. The iommu-driver unmaps the page pointing to that
> > iova and returns the size of the page unmapped. This still allows the
> > simple implementation for the unmap-call.
> 
> Yes, exactly. It will take some time to migrate all drivers (today we
> have 4 drivers, each of which is implementing a slightly different
> ->unmap() semantics), but at least let's not accept any new driver
> that doesn't adhere to this, otherwise it's going to be even harder
> for the API to evolve.

Agreed. We should change the existing drivers one-by-one.

> > We don't need to really enforce the calls to be symetric. But we can
> > define that we only give the guarantee about what will be unmapped
> > when the calls are symetric.
> 
> Sounds good to me. I'll add this to the kernel doc patch (which I'll
> submit after this patch set materializes), and when/if we move to
> symmetric only, we will update it.

Great. So if no one disagrees with this direction (this means all the
other people on Cc :) ) it is the way to.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
  2011-10-10 22:01                         ` Ohad Ben-Cohen
  (?)
@ 2011-10-11 10:21                           ` Roedel, Joerg
  -1 siblings, 0 replies; 106+ messages in thread
From: Roedel, Joerg @ 2011-10-11 10:21 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: iommu, linux-omap, Laurent Pinchart, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Stepan Moskovchenko, kvm, Hiroshi Doyu

On Mon, Oct 10, 2011 at 06:01:06PM -0400, Ohad Ben-Cohen wrote:
> > On Mon, Oct 10, 2011 at 5:36 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> >> The master branch is best to base your patches on for generic work.
> 
> It looks like the master branch is missing something like this:
> 
> >From acb316aa4bcaf383e8cb1580e30c8635e0a34369 Mon Sep 17 00:00:00 2001
> From: Ohad Ben-Cohen <ohad@wizery.com>
> Date: Mon, 10 Oct 2011 23:55:51 +0200
> Subject: [PATCH] iommu/core: fix build issue
> 
> Fix this:
> 
> drivers/iommu/iommu.c: In function 'iommu_commit':
> drivers/iommu/iommu.c:291: error: 'iommu_ops' undeclared (first use in
> this function)
> 
> Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
> ---
>  drivers/iommu/iommu.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 909b0d2..a5131f1 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -288,7 +288,7 @@ EXPORT_SYMBOL_GPL(iommu_unmap);
> 
>  void iommu_commit(struct iommu_domain *domain)
>  {
> -	if (iommu_ops->commit)
> -		iommu_ops->commit(domain);
> +	if (domain->ops->commit)
> +		domain->ops->commit(domain);
>  }
>  EXPORT_SYMBOL_GPL(iommu_commit);
> -- 
> 1.7.4.1

Whoops. This iommu_commit branch wasn't meant to be in master already.
I'll try to fix that.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-11 10:21                           ` Roedel, Joerg
  0 siblings, 0 replies; 106+ messages in thread
From: Roedel, Joerg @ 2011-10-11 10:21 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: kvm, Arnd Bergmann, linux-kernel, iommu, Hiroshi Doyu,
	Laurent Pinchart, David Brown, linux-omap, David Woodhouse,
	linux-arm-kernel, Stepan Moskovchenko

On Mon, Oct 10, 2011 at 06:01:06PM -0400, Ohad Ben-Cohen wrote:
> > On Mon, Oct 10, 2011 at 5:36 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> >> The master branch is best to base your patches on for generic work.
> 
> It looks like the master branch is missing something like this:
> 
> >From acb316aa4bcaf383e8cb1580e30c8635e0a34369 Mon Sep 17 00:00:00 2001
> From: Ohad Ben-Cohen <ohad@wizery.com>
> Date: Mon, 10 Oct 2011 23:55:51 +0200
> Subject: [PATCH] iommu/core: fix build issue
> 
> Fix this:
> 
> drivers/iommu/iommu.c: In function 'iommu_commit':
> drivers/iommu/iommu.c:291: error: 'iommu_ops' undeclared (first use in
> this function)
> 
> Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
> ---
>  drivers/iommu/iommu.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 909b0d2..a5131f1 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -288,7 +288,7 @@ EXPORT_SYMBOL_GPL(iommu_unmap);
> 
>  void iommu_commit(struct iommu_domain *domain)
>  {
> -	if (iommu_ops->commit)
> -		iommu_ops->commit(domain);
> +	if (domain->ops->commit)
> +		domain->ops->commit(domain);
>  }
>  EXPORT_SYMBOL_GPL(iommu_commit);
> -- 
> 1.7.4.1

Whoops. This iommu_commit branch wasn't meant to be in master already.
I'll try to fix that.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-11 10:21                           ` Roedel, Joerg
  0 siblings, 0 replies; 106+ messages in thread
From: Roedel, Joerg @ 2011-10-11 10:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 10, 2011 at 06:01:06PM -0400, Ohad Ben-Cohen wrote:
> > On Mon, Oct 10, 2011 at 5:36 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> >> The master branch is best to base your patches on for generic work.
> 
> It looks like the master branch is missing something like this:
> 
> >From acb316aa4bcaf383e8cb1580e30c8635e0a34369 Mon Sep 17 00:00:00 2001
> From: Ohad Ben-Cohen <ohad@wizery.com>
> Date: Mon, 10 Oct 2011 23:55:51 +0200
> Subject: [PATCH] iommu/core: fix build issue
> 
> Fix this:
> 
> drivers/iommu/iommu.c: In function 'iommu_commit':
> drivers/iommu/iommu.c:291: error: 'iommu_ops' undeclared (first use in
> this function)
> 
> Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
> ---
>  drivers/iommu/iommu.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 909b0d2..a5131f1 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -288,7 +288,7 @@ EXPORT_SYMBOL_GPL(iommu_unmap);
> 
>  void iommu_commit(struct iommu_domain *domain)
>  {
> -	if (iommu_ops->commit)
> -		iommu_ops->commit(domain);
> +	if (domain->ops->commit)
> +		domain->ops->commit(domain);
>  }
>  EXPORT_SYMBOL_GPL(iommu_commit);
> -- 
> 1.7.4.1

Whoops. This iommu_commit branch wasn't meant to be in master already.
I'll try to fix that.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
  2011-10-10 22:49                       ` Ohad Ben-Cohen
  (?)
@ 2011-10-11 10:38                         ` Roedel, Joerg
  -1 siblings, 0 replies; 106+ messages in thread
From: Roedel, Joerg @ 2011-10-11 10:38 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: iommu, linux-omap, Laurent Pinchart, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Stepan Moskovchenko, kvm, Hiroshi Doyu

On Mon, Oct 10, 2011 at 06:49:32PM -0400, Ohad Ben-Cohen wrote:
> -int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
> +int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int size)
>  {
> -       size_t size;
> +       int order, unmapped_size, unmapped_order, total_unmapped = 0;
> 
>         if (unlikely(domain->ops->unmap == NULL))
>                 return -ENODEV;
> 
> -       size         = PAGE_SIZE << gfp_order;
> +       /*
> +        * The virtual address, as well as the size of the mapping, must be
> +        * aligned (at least) to the size of the smallest page supported
> +        * by the hardware
> +        */
> +       if (!IS_ALIGNED(iova | size, domain->ops->min_pagesz)) {
> +               pr_err("unaligned: iova 0x%lx size 0x%x min_pagesz 0x%x\n",
> +                       iova, size, domain->ops->min_pagesz);
> +               return -EINVAL;
> +       }
> +
> +       pr_debug("unmap this: iova 0x%lx size 0x%x\n", iova, size);
> +
> +       while (size > total_unmapped) {
> +               order = get_order(size - total_unmapped);

Hmm, this actually unmaps more than requested, even in the symetric
case. Imgine the user wants to unmap a 12kb region, then order
will be 4 and this code will tell the iommu-driver to unmap 16kb.

You need to make sure that you don;t pass larger regions then requested
to the low-level driver. Some logic like in the iommu_map function
should do it.

> +
> +               unmapped_order = domain->ops->unmap(domain, iova, order);
> +               if (unmapped_order < 0)
> +                       break;
> +
> +               pr_debug("unmapped: iova 0x%lx order %d\n", iova,
> +                                                       unmapped_order);
> +
> +               unmapped_size = 0x1000UL << unmapped_order;

Please use PAGE_SIZE instead of 0x1000UL.

> 
> -       BUG_ON(!IS_ALIGNED(iova, size));
> +               iova += unmapped_size;
> +               total_unmapped += unmapped_size;
> +       }
> 
> -       return domain->ops->unmap(domain, iova, gfp_order);
> +       return total_unmapped;
>  }
>  EXPORT_SYMBOL_GPL(iommu_unmap);
> 
> diff --git a/drivers/iommu/omap-iovmm.c b/drivers/iommu/omap-iovmm.c
> index e8fdb88..4a8f761 100644
> --- a/drivers/iommu/omap-iovmm.c
> +++ b/drivers/iommu/omap-iovmm.c
> @@ -409,7 +409,6 @@ static int map_iovm_area(struct iommu_domain
> *domain, struct iovm_struct *new,
>         unsigned int i, j;
>         struct scatterlist *sg;
>         u32 da = new->da_start;
> -       int order;
> 
>         if (!domain || !sgt)
>                 return -EINVAL;
> @@ -428,12 +427,10 @@ static int map_iovm_area(struct iommu_domain
> *domain, struct iovm_struct *new,
>                 if (bytes_to_iopgsz(bytes) < 0)
>                         goto err_out;
> 
> -               order = get_order(bytes);
> -
>                 pr_debug("%s: [%d] %08x %08x(%x)\n", __func__,
>                          i, da, pa, bytes);
> 
> -               err = iommu_map(domain, da, pa, order, flags);
> +               err = iommu_map(domain, da, pa, bytes, flags);
>                 if (err)
>                         goto err_out;
> 
> @@ -448,10 +445,9 @@ err_out:
>                 size_t bytes;
> 
>                 bytes = sg->length + sg->offset;
> -               order = get_order(bytes);
> 
>                 /* ignore failures.. we're already handling one */
> -               iommu_unmap(domain, da, order);
> +               iommu_unmap(domain, da, bytes);
> 
>                 da += bytes;
>         }
> @@ -474,13 +470,11 @@ static void unmap_iovm_area(struct iommu_domain
> *domain, struct omap_iommu *obj,
>         start = area->da_start;
>         for_each_sg(sgt->sgl, sg, sgt->nents, i) {
>                 size_t bytes;
> -               int order;
> 
>                 bytes = sg->length + sg->offset;
> -               order = get_order(bytes);
> 
> -               err = iommu_unmap(domain, start, order);
> -               if (err < 0)
> +               err = iommu_unmap(domain, start, bytes);
> +               if (err < bytes)
>                         break;
> 
>                 dev_dbg(obj->dev, "%s: unmap %08x(%x) %08x\n",
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 710291f..a10a86c 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -48,6 +48,22 @@ struct iommu_domain {
> 
>  #ifdef CONFIG_IOMMU_API
> 
> +/**
> + * struct iommu_ops - iommu ops and capabilities
> + * @domain_init: init iommu domain
> + * @domain_destroy: destroy iommu domain
> + * @attach_dev: attach device to an iommu domain
> + * @detach_dev: detach device from an iommu domain
> + * @map: map a physically contiguous memory region to an iommu domain
> + * @unmap: unmap a physically contiguous memory region from an iommu domain
> + * @iova_to_phys: translate iova to physical address
> + * @domain_has_cap: domain capabilities query
> + * @commit: commit iommu domain
> + * @pgsize_bitmap: bitmap of supported page sizes
> + * @min_pagesz: smallest page size supported. note: this member is private
> + *             to the IOMMU core, and maintained only for efficiency sake;
> + *             drivers don't need to set it.
> + */
>  struct iommu_ops {
>         int (*domain_init)(struct iommu_domain *domain);
>         void (*domain_destroy)(struct iommu_domain *domain);
> @@ -62,6 +78,8 @@ struct iommu_ops {
>         int (*domain_has_cap)(struct iommu_domain *domain,
>                               unsigned long cap);
>         void (*commit)(struct iommu_domain *domain);
> +       unsigned long pgsize_bitmap;
> +       unsigned int min_pagesz;
>  };
> 
>  extern int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops);
> @@ -73,9 +91,9 @@ extern int iommu_attach_device(struct iommu_domain *domain,
>  extern void iommu_detach_device(struct iommu_domain *domain,
>                                 struct device *dev);
>  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
> -                    phys_addr_t paddr, int gfp_order, int prot);
> +                    phys_addr_t paddr, int size, int prot);
>  extern int iommu_unmap(struct iommu_domain *domain, unsigned long iova,
> -                      int gfp_order);
> +                      int size);
>  extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
>                                       unsigned long iova);
>  extern int iommu_domain_has_cap(struct iommu_domain *domain,
> diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
> index ca409be..26b3199 100644
> --- a/virt/kvm/iommu.c
> +++ b/virt/kvm/iommu.c
> @@ -111,7 +111,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct
> kvm_memory_slot *slot)
> 
>                 /* Map into IO address space */
>                 r = iommu_map(domain, gfn_to_gpa(gfn), pfn_to_hpa(pfn),
> -                             get_order(page_size), flags);
> +                             page_size, flags);
>                 if (r) {
>                         printk(KERN_ERR "kvm_iommu_map_address:"
>                                "iommu failed to map pfn=%llx\n", pfn);
> @@ -292,15 +292,15 @@ static void kvm_iommu_put_pages(struct kvm *kvm,
> 
>         while (gfn < end_gfn) {
>                 unsigned long unmap_pages;
> -               int order;
> +               int size;
> 
>                 /* Get physical address */
>                 phys = iommu_iova_to_phys(domain, gfn_to_gpa(gfn));
>                 pfn  = phys >> PAGE_SHIFT;
> 
>                 /* Unmap address from IO address space */
> -               order       = iommu_unmap(domain, gfn_to_gpa(gfn), 0);
> -               unmap_pages = 1ULL << order;
> +               size       = iommu_unmap(domain, gfn_to_gpa(gfn), PAGE_SIZE);
> +               unmap_pages = 1ULL << get_order(size);
> 
>                 /* Unpin all pages we just unmapped to not leak any memory */
>                 kvm_unpin_pages(kvm, pfn, unmap_pages);

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-11 10:38                         ` Roedel, Joerg
  0 siblings, 0 replies; 106+ messages in thread
From: Roedel, Joerg @ 2011-10-11 10:38 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: iommu, linux-omap, Laurent Pinchart, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Stepan Moskovchenko, kvm, Hiroshi Doyu

On Mon, Oct 10, 2011 at 06:49:32PM -0400, Ohad Ben-Cohen wrote:
> -int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
> +int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int size)
>  {
> -       size_t size;
> +       int order, unmapped_size, unmapped_order, total_unmapped = 0;
> 
>         if (unlikely(domain->ops->unmap == NULL))
>                 return -ENODEV;
> 
> -       size         = PAGE_SIZE << gfp_order;
> +       /*
> +        * The virtual address, as well as the size of the mapping, must be
> +        * aligned (at least) to the size of the smallest page supported
> +        * by the hardware
> +        */
> +       if (!IS_ALIGNED(iova | size, domain->ops->min_pagesz)) {
> +               pr_err("unaligned: iova 0x%lx size 0x%x min_pagesz 0x%x\n",
> +                       iova, size, domain->ops->min_pagesz);
> +               return -EINVAL;
> +       }
> +
> +       pr_debug("unmap this: iova 0x%lx size 0x%x\n", iova, size);
> +
> +       while (size > total_unmapped) {
> +               order = get_order(size - total_unmapped);

Hmm, this actually unmaps more than requested, even in the symetric
case. Imgine the user wants to unmap a 12kb region, then order
will be 4 and this code will tell the iommu-driver to unmap 16kb.

You need to make sure that you don;t pass larger regions then requested
to the low-level driver. Some logic like in the iommu_map function
should do it.

> +
> +               unmapped_order = domain->ops->unmap(domain, iova, order);
> +               if (unmapped_order < 0)
> +                       break;
> +
> +               pr_debug("unmapped: iova 0x%lx order %d\n", iova,
> +                                                       unmapped_order);
> +
> +               unmapped_size = 0x1000UL << unmapped_order;

Please use PAGE_SIZE instead of 0x1000UL.

> 
> -       BUG_ON(!IS_ALIGNED(iova, size));
> +               iova += unmapped_size;
> +               total_unmapped += unmapped_size;
> +       }
> 
> -       return domain->ops->unmap(domain, iova, gfp_order);
> +       return total_unmapped;
>  }
>  EXPORT_SYMBOL_GPL(iommu_unmap);
> 
> diff --git a/drivers/iommu/omap-iovmm.c b/drivers/iommu/omap-iovmm.c
> index e8fdb88..4a8f761 100644
> --- a/drivers/iommu/omap-iovmm.c
> +++ b/drivers/iommu/omap-iovmm.c
> @@ -409,7 +409,6 @@ static int map_iovm_area(struct iommu_domain
> *domain, struct iovm_struct *new,
>         unsigned int i, j;
>         struct scatterlist *sg;
>         u32 da = new->da_start;
> -       int order;
> 
>         if (!domain || !sgt)
>                 return -EINVAL;
> @@ -428,12 +427,10 @@ static int map_iovm_area(struct iommu_domain
> *domain, struct iovm_struct *new,
>                 if (bytes_to_iopgsz(bytes) < 0)
>                         goto err_out;
> 
> -               order = get_order(bytes);
> -
>                 pr_debug("%s: [%d] %08x %08x(%x)\n", __func__,
>                          i, da, pa, bytes);
> 
> -               err = iommu_map(domain, da, pa, order, flags);
> +               err = iommu_map(domain, da, pa, bytes, flags);
>                 if (err)
>                         goto err_out;
> 
> @@ -448,10 +445,9 @@ err_out:
>                 size_t bytes;
> 
>                 bytes = sg->length + sg->offset;
> -               order = get_order(bytes);
> 
>                 /* ignore failures.. we're already handling one */
> -               iommu_unmap(domain, da, order);
> +               iommu_unmap(domain, da, bytes);
> 
>                 da += bytes;
>         }
> @@ -474,13 +470,11 @@ static void unmap_iovm_area(struct iommu_domain
> *domain, struct omap_iommu *obj,
>         start = area->da_start;
>         for_each_sg(sgt->sgl, sg, sgt->nents, i) {
>                 size_t bytes;
> -               int order;
> 
>                 bytes = sg->length + sg->offset;
> -               order = get_order(bytes);
> 
> -               err = iommu_unmap(domain, start, order);
> -               if (err < 0)
> +               err = iommu_unmap(domain, start, bytes);
> +               if (err < bytes)
>                         break;
> 
>                 dev_dbg(obj->dev, "%s: unmap %08x(%x) %08x\n",
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 710291f..a10a86c 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -48,6 +48,22 @@ struct iommu_domain {
> 
>  #ifdef CONFIG_IOMMU_API
> 
> +/**
> + * struct iommu_ops - iommu ops and capabilities
> + * @domain_init: init iommu domain
> + * @domain_destroy: destroy iommu domain
> + * @attach_dev: attach device to an iommu domain
> + * @detach_dev: detach device from an iommu domain
> + * @map: map a physically contiguous memory region to an iommu domain
> + * @unmap: unmap a physically contiguous memory region from an iommu domain
> + * @iova_to_phys: translate iova to physical address
> + * @domain_has_cap: domain capabilities query
> + * @commit: commit iommu domain
> + * @pgsize_bitmap: bitmap of supported page sizes
> + * @min_pagesz: smallest page size supported. note: this member is private
> + *             to the IOMMU core, and maintained only for efficiency sake;
> + *             drivers don't need to set it.
> + */
>  struct iommu_ops {
>         int (*domain_init)(struct iommu_domain *domain);
>         void (*domain_destroy)(struct iommu_domain *domain);
> @@ -62,6 +78,8 @@ struct iommu_ops {
>         int (*domain_has_cap)(struct iommu_domain *domain,
>                               unsigned long cap);
>         void (*commit)(struct iommu_domain *domain);
> +       unsigned long pgsize_bitmap;
> +       unsigned int min_pagesz;
>  };
> 
>  extern int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops);
> @@ -73,9 +91,9 @@ extern int iommu_attach_device(struct iommu_domain *domain,
>  extern void iommu_detach_device(struct iommu_domain *domain,
>                                 struct device *dev);
>  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
> -                    phys_addr_t paddr, int gfp_order, int prot);
> +                    phys_addr_t paddr, int size, int prot);
>  extern int iommu_unmap(struct iommu_domain *domain, unsigned long iova,
> -                      int gfp_order);
> +                      int size);
>  extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
>                                       unsigned long iova);
>  extern int iommu_domain_has_cap(struct iommu_domain *domain,
> diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
> index ca409be..26b3199 100644
> --- a/virt/kvm/iommu.c
> +++ b/virt/kvm/iommu.c
> @@ -111,7 +111,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct
> kvm_memory_slot *slot)
> 
>                 /* Map into IO address space */
>                 r = iommu_map(domain, gfn_to_gpa(gfn), pfn_to_hpa(pfn),
> -                             get_order(page_size), flags);
> +                             page_size, flags);
>                 if (r) {
>                         printk(KERN_ERR "kvm_iommu_map_address:"
>                                "iommu failed to map pfn=%llx\n", pfn);
> @@ -292,15 +292,15 @@ static void kvm_iommu_put_pages(struct kvm *kvm,
> 
>         while (gfn < end_gfn) {
>                 unsigned long unmap_pages;
> -               int order;
> +               int size;
> 
>                 /* Get physical address */
>                 phys = iommu_iova_to_phys(domain, gfn_to_gpa(gfn));
>                 pfn  = phys >> PAGE_SHIFT;
> 
>                 /* Unmap address from IO address space */
> -               order       = iommu_unmap(domain, gfn_to_gpa(gfn), 0);
> -               unmap_pages = 1ULL << order;
> +               size       = iommu_unmap(domain, gfn_to_gpa(gfn), PAGE_SIZE);
> +               unmap_pages = 1ULL << get_order(size);
> 
>                 /* Unpin all pages we just unmapped to not leak any memory */
>                 kvm_unpin_pages(kvm, pfn, unmap_pages);

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-11 10:38                         ` Roedel, Joerg
  0 siblings, 0 replies; 106+ messages in thread
From: Roedel, Joerg @ 2011-10-11 10:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 10, 2011 at 06:49:32PM -0400, Ohad Ben-Cohen wrote:
> -int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
> +int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int size)
>  {
> -       size_t size;
> +       int order, unmapped_size, unmapped_order, total_unmapped = 0;
> 
>         if (unlikely(domain->ops->unmap == NULL))
>                 return -ENODEV;
> 
> -       size         = PAGE_SIZE << gfp_order;
> +       /*
> +        * The virtual address, as well as the size of the mapping, must be
> +        * aligned (at least) to the size of the smallest page supported
> +        * by the hardware
> +        */
> +       if (!IS_ALIGNED(iova | size, domain->ops->min_pagesz)) {
> +               pr_err("unaligned: iova 0x%lx size 0x%x min_pagesz 0x%x\n",
> +                       iova, size, domain->ops->min_pagesz);
> +               return -EINVAL;
> +       }
> +
> +       pr_debug("unmap this: iova 0x%lx size 0x%x\n", iova, size);
> +
> +       while (size > total_unmapped) {
> +               order = get_order(size - total_unmapped);

Hmm, this actually unmaps more than requested, even in the symetric
case. Imgine the user wants to unmap a 12kb region, then order
will be 4 and this code will tell the iommu-driver to unmap 16kb.

You need to make sure that you don;t pass larger regions then requested
to the low-level driver. Some logic like in the iommu_map function
should do it.

> +
> +               unmapped_order = domain->ops->unmap(domain, iova, order);
> +               if (unmapped_order < 0)
> +                       break;
> +
> +               pr_debug("unmapped: iova 0x%lx order %d\n", iova,
> +                                                       unmapped_order);
> +
> +               unmapped_size = 0x1000UL << unmapped_order;

Please use PAGE_SIZE instead of 0x1000UL.

> 
> -       BUG_ON(!IS_ALIGNED(iova, size));
> +               iova += unmapped_size;
> +               total_unmapped += unmapped_size;
> +       }
> 
> -       return domain->ops->unmap(domain, iova, gfp_order);
> +       return total_unmapped;
>  }
>  EXPORT_SYMBOL_GPL(iommu_unmap);
> 
> diff --git a/drivers/iommu/omap-iovmm.c b/drivers/iommu/omap-iovmm.c
> index e8fdb88..4a8f761 100644
> --- a/drivers/iommu/omap-iovmm.c
> +++ b/drivers/iommu/omap-iovmm.c
> @@ -409,7 +409,6 @@ static int map_iovm_area(struct iommu_domain
> *domain, struct iovm_struct *new,
>         unsigned int i, j;
>         struct scatterlist *sg;
>         u32 da = new->da_start;
> -       int order;
> 
>         if (!domain || !sgt)
>                 return -EINVAL;
> @@ -428,12 +427,10 @@ static int map_iovm_area(struct iommu_domain
> *domain, struct iovm_struct *new,
>                 if (bytes_to_iopgsz(bytes) < 0)
>                         goto err_out;
> 
> -               order = get_order(bytes);
> -
>                 pr_debug("%s: [%d] %08x %08x(%x)\n", __func__,
>                          i, da, pa, bytes);
> 
> -               err = iommu_map(domain, da, pa, order, flags);
> +               err = iommu_map(domain, da, pa, bytes, flags);
>                 if (err)
>                         goto err_out;
> 
> @@ -448,10 +445,9 @@ err_out:
>                 size_t bytes;
> 
>                 bytes = sg->length + sg->offset;
> -               order = get_order(bytes);
> 
>                 /* ignore failures.. we're already handling one */
> -               iommu_unmap(domain, da, order);
> +               iommu_unmap(domain, da, bytes);
> 
>                 da += bytes;
>         }
> @@ -474,13 +470,11 @@ static void unmap_iovm_area(struct iommu_domain
> *domain, struct omap_iommu *obj,
>         start = area->da_start;
>         for_each_sg(sgt->sgl, sg, sgt->nents, i) {
>                 size_t bytes;
> -               int order;
> 
>                 bytes = sg->length + sg->offset;
> -               order = get_order(bytes);
> 
> -               err = iommu_unmap(domain, start, order);
> -               if (err < 0)
> +               err = iommu_unmap(domain, start, bytes);
> +               if (err < bytes)
>                         break;
> 
>                 dev_dbg(obj->dev, "%s: unmap %08x(%x) %08x\n",
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 710291f..a10a86c 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -48,6 +48,22 @@ struct iommu_domain {
> 
>  #ifdef CONFIG_IOMMU_API
> 
> +/**
> + * struct iommu_ops - iommu ops and capabilities
> + * @domain_init: init iommu domain
> + * @domain_destroy: destroy iommu domain
> + * @attach_dev: attach device to an iommu domain
> + * @detach_dev: detach device from an iommu domain
> + * @map: map a physically contiguous memory region to an iommu domain
> + * @unmap: unmap a physically contiguous memory region from an iommu domain
> + * @iova_to_phys: translate iova to physical address
> + * @domain_has_cap: domain capabilities query
> + * @commit: commit iommu domain
> + * @pgsize_bitmap: bitmap of supported page sizes
> + * @min_pagesz: smallest page size supported. note: this member is private
> + *             to the IOMMU core, and maintained only for efficiency sake;
> + *             drivers don't need to set it.
> + */
>  struct iommu_ops {
>         int (*domain_init)(struct iommu_domain *domain);
>         void (*domain_destroy)(struct iommu_domain *domain);
> @@ -62,6 +78,8 @@ struct iommu_ops {
>         int (*domain_has_cap)(struct iommu_domain *domain,
>                               unsigned long cap);
>         void (*commit)(struct iommu_domain *domain);
> +       unsigned long pgsize_bitmap;
> +       unsigned int min_pagesz;
>  };
> 
>  extern int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops);
> @@ -73,9 +91,9 @@ extern int iommu_attach_device(struct iommu_domain *domain,
>  extern void iommu_detach_device(struct iommu_domain *domain,
>                                 struct device *dev);
>  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
> -                    phys_addr_t paddr, int gfp_order, int prot);
> +                    phys_addr_t paddr, int size, int prot);
>  extern int iommu_unmap(struct iommu_domain *domain, unsigned long iova,
> -                      int gfp_order);
> +                      int size);
>  extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
>                                       unsigned long iova);
>  extern int iommu_domain_has_cap(struct iommu_domain *domain,
> diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
> index ca409be..26b3199 100644
> --- a/virt/kvm/iommu.c
> +++ b/virt/kvm/iommu.c
> @@ -111,7 +111,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct
> kvm_memory_slot *slot)
> 
>                 /* Map into IO address space */
>                 r = iommu_map(domain, gfn_to_gpa(gfn), pfn_to_hpa(pfn),
> -                             get_order(page_size), flags);
> +                             page_size, flags);
>                 if (r) {
>                         printk(KERN_ERR "kvm_iommu_map_address:"
>                                "iommu failed to map pfn=%llx\n", pfn);
> @@ -292,15 +292,15 @@ static void kvm_iommu_put_pages(struct kvm *kvm,
> 
>         while (gfn < end_gfn) {
>                 unsigned long unmap_pages;
> -               int order;
> +               int size;
> 
>                 /* Get physical address */
>                 phys = iommu_iova_to_phys(domain, gfn_to_gpa(gfn));
>                 pfn  = phys >> PAGE_SHIFT;
> 
>                 /* Unmap address from IO address space */
> -               order       = iommu_unmap(domain, gfn_to_gpa(gfn), 0);
> -               unmap_pages = 1ULL << order;
> +               size       = iommu_unmap(domain, gfn_to_gpa(gfn), PAGE_SIZE);
> +               unmap_pages = 1ULL << get_order(size);
> 
>                 /* Unpin all pages we just unmapped to not leak any memory */
>                 kvm_unpin_pages(kvm, pfn, unmap_pages);

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
  2011-10-10 22:49                       ` Ohad Ben-Cohen
  (?)
@ 2011-10-11 14:59                         ` KyongHo Cho
  -1 siblings, 0 replies; 106+ messages in thread
From: KyongHo Cho @ 2011-10-11 14:59 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: Roedel, Joerg, iommu, linux-omap, Laurent Pinchart,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Stepan Moskovchenko, kvm, Hiroshi Doyu, Kukjin Kim

On Tue, Oct 11, 2011 at 7:49 AM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
>  int iommu_map(struct iommu_domain *domain, unsigned long iova,
> -             phys_addr_t paddr, int gfp_order, int prot)
> +             phys_addr_t paddr, int size, int prot)
>  {
Even though it is not realistic that size becomes larger than 1 <<
((sizeof(int) * 8) - 1),
I think the type of size should be size_t.

> -       size_t size;
> +       unsigned long orig_iova = iova;
> +       int ret = 0, orig_size = size;
>
>        if (unlikely(domain->ops->map == NULL))
>                return -ENODEV;
>
> -       size         = PAGE_SIZE << gfp_order;
> +       /*
> +        * both the virtual address and the physical one, as well as
> +        * the size of the mapping, must be aligned (at least) to the
> +        * size of the smallest page supported by the hardware
> +        */
> +       if (!IS_ALIGNED(iova | paddr | size, domain->ops->min_pagesz)) {
> +               pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%x min_pagesz "
> +                       "0x%x\n", iova, (unsigned long)paddr,
> +                       size, domain->ops->min_pagesz);
> +               return -EINVAL;
> +       }
> +
> +       pr_debug("map: iova 0x%lx pa 0x%lx size 0x%x\n", iova,
> +                               (unsigned long)paddr, size);
> +
> +       while (size) {
> +               unsigned long pgsize, addr_merge = iova | paddr;
> +               unsigned int pgsize_idx;
> +
> +               /* Max page size that still fits into 'size' */
> +               pgsize_idx = __fls(size);
> +
> +               /* need to consider alignment requirements ? */
> +               if (likely(addr_merge)) {
> +                       /* Max page size allowed by both iova and paddr */
> +                       unsigned int align_pgsize_idx = __ffs(addr_merge);
> +
> +                       pgsize_idx = min(pgsize_idx, align_pgsize_idx);
> +               }
> +
> +               /* build a mask of acceptable page sizes */
> +               pgsize = (1UL << (pgsize_idx + 1)) - 1;
> +
> +               /* throw away page sizes not supported by the hardware */
> +               pgsize &= domain->ops->pgsize_bitmap;
> +
> +               /* make sure we're still sane */
> +               BUG_ON(!pgsize);
>
> -       BUG_ON(!IS_ALIGNED(iova | paddr, size));
> +               /* pick the biggest page */
> +               pgsize_idx = __fls(pgsize);
> +               pgsize = 1UL << pgsize_idx;
>
> -       return domain->ops->map(domain, iova, paddr, gfp_order, prot);
> +               /* convert index to page order */
> +               pgsize_idx -= PAGE_SHIFT;
> +
> +               pr_debug("mapping: iova 0x%lx pa 0x%lx order %u\n", iova,
> +                                       (unsigned long)paddr, pgsize_idx);
> +
> +               ret = domain->ops->map(domain, iova, paddr, pgsize_idx, prot);
> +               if (ret)
> +                       break;
> +
> +               iova += pgsize;
> +               paddr += pgsize;
> +               size -= pgsize;
> +       }
> +
> +       /* unroll mapping in case something went wrong */
> +       if (ret)
> +               iommu_unmap(domain, orig_iova, orig_size);
> +
domain->ops->map() might return error because a mapping already exists
in the range from iova until iova + size.
Thus, it should be
iommu_unmap(domain, orig_iova, orig_size - size)

Regards,

KyongHo

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-11 14:59                         ` KyongHo Cho
  0 siblings, 0 replies; 106+ messages in thread
From: KyongHo Cho @ 2011-10-11 14:59 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: Kukjin Kim, kvm, Arnd Bergmann, Roedel, Joerg, linux-kernel,
	iommu, Hiroshi Doyu, Laurent Pinchart, David Brown, linux-omap,
	David Woodhouse, linux-arm-kernel, Stepan Moskovchenko

On Tue, Oct 11, 2011 at 7:49 AM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
>  int iommu_map(struct iommu_domain *domain, unsigned long iova,
> -             phys_addr_t paddr, int gfp_order, int prot)
> +             phys_addr_t paddr, int size, int prot)
>  {
Even though it is not realistic that size becomes larger than 1 <<
((sizeof(int) * 8) - 1),
I think the type of size should be size_t.

> -       size_t size;
> +       unsigned long orig_iova = iova;
> +       int ret = 0, orig_size = size;
>
>        if (unlikely(domain->ops->map == NULL))
>                return -ENODEV;
>
> -       size         = PAGE_SIZE << gfp_order;
> +       /*
> +        * both the virtual address and the physical one, as well as
> +        * the size of the mapping, must be aligned (at least) to the
> +        * size of the smallest page supported by the hardware
> +        */
> +       if (!IS_ALIGNED(iova | paddr | size, domain->ops->min_pagesz)) {
> +               pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%x min_pagesz "
> +                       "0x%x\n", iova, (unsigned long)paddr,
> +                       size, domain->ops->min_pagesz);
> +               return -EINVAL;
> +       }
> +
> +       pr_debug("map: iova 0x%lx pa 0x%lx size 0x%x\n", iova,
> +                               (unsigned long)paddr, size);
> +
> +       while (size) {
> +               unsigned long pgsize, addr_merge = iova | paddr;
> +               unsigned int pgsize_idx;
> +
> +               /* Max page size that still fits into 'size' */
> +               pgsize_idx = __fls(size);
> +
> +               /* need to consider alignment requirements ? */
> +               if (likely(addr_merge)) {
> +                       /* Max page size allowed by both iova and paddr */
> +                       unsigned int align_pgsize_idx = __ffs(addr_merge);
> +
> +                       pgsize_idx = min(pgsize_idx, align_pgsize_idx);
> +               }
> +
> +               /* build a mask of acceptable page sizes */
> +               pgsize = (1UL << (pgsize_idx + 1)) - 1;
> +
> +               /* throw away page sizes not supported by the hardware */
> +               pgsize &= domain->ops->pgsize_bitmap;
> +
> +               /* make sure we're still sane */
> +               BUG_ON(!pgsize);
>
> -       BUG_ON(!IS_ALIGNED(iova | paddr, size));
> +               /* pick the biggest page */
> +               pgsize_idx = __fls(pgsize);
> +               pgsize = 1UL << pgsize_idx;
>
> -       return domain->ops->map(domain, iova, paddr, gfp_order, prot);
> +               /* convert index to page order */
> +               pgsize_idx -= PAGE_SHIFT;
> +
> +               pr_debug("mapping: iova 0x%lx pa 0x%lx order %u\n", iova,
> +                                       (unsigned long)paddr, pgsize_idx);
> +
> +               ret = domain->ops->map(domain, iova, paddr, pgsize_idx, prot);
> +               if (ret)
> +                       break;
> +
> +               iova += pgsize;
> +               paddr += pgsize;
> +               size -= pgsize;
> +       }
> +
> +       /* unroll mapping in case something went wrong */
> +       if (ret)
> +               iommu_unmap(domain, orig_iova, orig_size);
> +
domain->ops->map() might return error because a mapping already exists
in the range from iova until iova + size.
Thus, it should be
iommu_unmap(domain, orig_iova, orig_size - size)

Regards,

KyongHo

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-11 14:59                         ` KyongHo Cho
  0 siblings, 0 replies; 106+ messages in thread
From: KyongHo Cho @ 2011-10-11 14:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 11, 2011 at 7:49 AM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> ?int iommu_map(struct iommu_domain *domain, unsigned long iova,
> - ? ? ? ? ? ? phys_addr_t paddr, int gfp_order, int prot)
> + ? ? ? ? ? ? phys_addr_t paddr, int size, int prot)
> ?{
Even though it is not realistic that size becomes larger than 1 <<
((sizeof(int) * 8) - 1),
I think the type of size should be size_t.

> - ? ? ? size_t size;
> + ? ? ? unsigned long orig_iova = iova;
> + ? ? ? int ret = 0, orig_size = size;
>
> ? ? ? ?if (unlikely(domain->ops->map == NULL))
> ? ? ? ? ? ? ? ?return -ENODEV;
>
> - ? ? ? size ? ? ? ? = PAGE_SIZE << gfp_order;
> + ? ? ? /*
> + ? ? ? ?* both the virtual address and the physical one, as well as
> + ? ? ? ?* the size of the mapping, must be aligned (at least) to the
> + ? ? ? ?* size of the smallest page supported by the hardware
> + ? ? ? ?*/
> + ? ? ? if (!IS_ALIGNED(iova | paddr | size, domain->ops->min_pagesz)) {
> + ? ? ? ? ? ? ? pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%x min_pagesz "
> + ? ? ? ? ? ? ? ? ? ? ? "0x%x\n", iova, (unsigned long)paddr,
> + ? ? ? ? ? ? ? ? ? ? ? size, domain->ops->min_pagesz);
> + ? ? ? ? ? ? ? return -EINVAL;
> + ? ? ? }
> +
> + ? ? ? pr_debug("map: iova 0x%lx pa 0x%lx size 0x%x\n", iova,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (unsigned long)paddr, size);
> +
> + ? ? ? while (size) {
> + ? ? ? ? ? ? ? unsigned long pgsize, addr_merge = iova | paddr;
> + ? ? ? ? ? ? ? unsigned int pgsize_idx;
> +
> + ? ? ? ? ? ? ? /* Max page size that still fits into 'size' */
> + ? ? ? ? ? ? ? pgsize_idx = __fls(size);
> +
> + ? ? ? ? ? ? ? /* need to consider alignment requirements ? */
> + ? ? ? ? ? ? ? if (likely(addr_merge)) {
> + ? ? ? ? ? ? ? ? ? ? ? /* Max page size allowed by both iova and paddr */
> + ? ? ? ? ? ? ? ? ? ? ? unsigned int align_pgsize_idx = __ffs(addr_merge);
> +
> + ? ? ? ? ? ? ? ? ? ? ? pgsize_idx = min(pgsize_idx, align_pgsize_idx);
> + ? ? ? ? ? ? ? }
> +
> + ? ? ? ? ? ? ? /* build a mask of acceptable page sizes */
> + ? ? ? ? ? ? ? pgsize = (1UL << (pgsize_idx + 1)) - 1;
> +
> + ? ? ? ? ? ? ? /* throw away page sizes not supported by the hardware */
> + ? ? ? ? ? ? ? pgsize &= domain->ops->pgsize_bitmap;
> +
> + ? ? ? ? ? ? ? /* make sure we're still sane */
> + ? ? ? ? ? ? ? BUG_ON(!pgsize);
>
> - ? ? ? BUG_ON(!IS_ALIGNED(iova | paddr, size));
> + ? ? ? ? ? ? ? /* pick the biggest page */
> + ? ? ? ? ? ? ? pgsize_idx = __fls(pgsize);
> + ? ? ? ? ? ? ? pgsize = 1UL << pgsize_idx;
>
> - ? ? ? return domain->ops->map(domain, iova, paddr, gfp_order, prot);
> + ? ? ? ? ? ? ? /* convert index to page order */
> + ? ? ? ? ? ? ? pgsize_idx -= PAGE_SHIFT;
> +
> + ? ? ? ? ? ? ? pr_debug("mapping: iova 0x%lx pa 0x%lx order %u\n", iova,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (unsigned long)paddr, pgsize_idx);
> +
> + ? ? ? ? ? ? ? ret = domain->ops->map(domain, iova, paddr, pgsize_idx, prot);
> + ? ? ? ? ? ? ? if (ret)
> + ? ? ? ? ? ? ? ? ? ? ? break;
> +
> + ? ? ? ? ? ? ? iova += pgsize;
> + ? ? ? ? ? ? ? paddr += pgsize;
> + ? ? ? ? ? ? ? size -= pgsize;
> + ? ? ? }
> +
> + ? ? ? /* unroll mapping in case something went wrong */
> + ? ? ? if (ret)
> + ? ? ? ? ? ? ? iommu_unmap(domain, orig_iova, orig_size);
> +
domain->ops->map() might return error because a mapping already exists
in the range from iova until iova + size.
Thus, it should be
iommu_unmap(domain, orig_iova, orig_size - size)

Regards,

KyongHo

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
  2011-10-11 10:38                         ` Roedel, Joerg
  (?)
@ 2011-10-11 17:01                           ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-11 17:01 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: iommu, linux-omap, Laurent Pinchart, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Stepan Moskovchenko, kvm, Hiroshi Doyu

On Tue, Oct 11, 2011 at 12:38 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> You need to make sure that you don;t pass larger regions then requested
> to the low-level driver. Some logic like in the iommu_map function
> should do it.

You're right. But the solution IMHO should be getting rid of that size
-> order -> size conversion which we do back and forth, as it's kinda
pointless.

> Please use PAGE_SIZE instead of 0x1000UL.

This one would go away too if we remove the size -> order -> size
conversions. In fact, iommu_unmap() just becomes so much simpler when
we do that.

I'm attaching two patches: first one that removes the
size->order->size conversions we have, and then the usual pgsize patch
rebased to it.

Please take a look,

Thanks,
Ohad.

>From 7102bda53a425872591f14f850ab031b1ca5dae1 Mon Sep 17 00:00:00 2001
From: Ohad Ben-Cohen <ohad@wizery.com>
Date: Tue, 11 Oct 2011 16:50:38 +0200
Subject: [PATCH 1/2] iommu/core: stop converting bytes to page order
back and forth

Express sizes in bytes rather than in page order, to eliminate the
size->order->size conversions we have whenever the IOMMU API is calling
the low level drivers' map/unmap methods.

Adopt all existing drivers.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
---
 drivers/iommu/amd_iommu.c   |   13 +++++--------
 drivers/iommu/intel-iommu.c |   11 ++++-------
 drivers/iommu/iommu.c       |    8 +++++---
 drivers/iommu/msm_iommu.c   |   19 +++++++------------
 drivers/iommu/omap-iommu.c  |   14 +++++---------
 include/linux/iommu.h       |    6 +++---
 6 files changed, 29 insertions(+), 42 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 764e3da..32d502e 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2702,9 +2702,8 @@ static int amd_iommu_attach_device(struct
iommu_domain *dom,
 }

 static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova,
-			 phys_addr_t paddr, int gfp_order, int iommu_prot)
+			 phys_addr_t paddr, size_t page_size, int iommu_prot)
 {
-	unsigned long page_size = 0x1000UL << gfp_order;
 	struct protection_domain *domain = dom->priv;
 	int prot = 0;
 	int ret;
@@ -2721,13 +2720,11 @@ static int amd_iommu_map(struct iommu_domain
*dom, unsigned long iova,
 	return ret;
 }

-static int amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova,
-			   int gfp_order)
+static size_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova,
+			   size_t page_size)
 {
 	struct protection_domain *domain = dom->priv;
-	unsigned long page_size, unmap_size;
-
-	page_size  = 0x1000UL << gfp_order;
+	size_t unmap_size;

 	mutex_lock(&domain->api_lock);
 	unmap_size = iommu_unmap_page(domain, iova, page_size);
@@ -2735,7 +2732,7 @@ static int amd_iommu_unmap(struct iommu_domain
*dom, unsigned long iova,

 	domain_flush_tlb_pde(domain);

-	return get_order(unmap_size);
+	return unmap_size;
 }

 static phys_addr_t amd_iommu_iova_to_phys(struct iommu_domain *dom,
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 2d53c3d..96a9bd4 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3821,12 +3821,11 @@ static void intel_iommu_detach_device(struct
iommu_domain *domain,

 static int intel_iommu_map(struct iommu_domain *domain,
 			   unsigned long iova, phys_addr_t hpa,
-			   int gfp_order, int iommu_prot)
+			   size_t size, int iommu_prot)
 {
 	struct dmar_domain *dmar_domain = domain->priv;
 	u64 max_addr;
 	int prot = 0;
-	size_t size;
 	int ret;

 	if (iommu_prot & IOMMU_READ)
@@ -3836,7 +3835,6 @@ static int intel_iommu_map(struct iommu_domain *domain,
 	if ((iommu_prot & IOMMU_CACHE) && dmar_domain->iommu_snooping)
 		prot |= DMA_PTE_SNP;

-	size     = PAGE_SIZE << gfp_order;
 	max_addr = iova + size;
 	if (dmar_domain->max_addr < max_addr) {
 		u64 end;
@@ -3859,11 +3857,10 @@ static int intel_iommu_map(struct iommu_domain *domain,
 	return ret;
 }

-static int intel_iommu_unmap(struct iommu_domain *domain,
-			     unsigned long iova, int gfp_order)
+static size_t intel_iommu_unmap(struct iommu_domain *domain,
+			     unsigned long iova, size_t size)
 {
 	struct dmar_domain *dmar_domain = domain->priv;
-	size_t size = PAGE_SIZE << gfp_order;

 	dma_pte_clear_range(dmar_domain, iova >> VTD_PAGE_SHIFT,
 			    (iova + size - 1) >> VTD_PAGE_SHIFT);
@@ -3871,7 +3868,7 @@ static int intel_iommu_unmap(struct iommu_domain *domain,
 	if (dmar_domain->max_addr == iova + size)
 		dmar_domain->max_addr = iova;

-	return gfp_order;
+	return size;
 }

 static phys_addr_t intel_iommu_iova_to_phys(struct iommu_domain *domain,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 63ca9ba..5d042e8 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -168,13 +168,13 @@ int iommu_map(struct iommu_domain *domain,
unsigned long iova,

 	BUG_ON(!IS_ALIGNED(iova | paddr, size));

-	return domain->ops->map(domain, iova, paddr, gfp_order, prot);
+	return domain->ops->map(domain, iova, paddr, size, prot);
 }
 EXPORT_SYMBOL_GPL(iommu_map);

 int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
 {
-	size_t size;
+	size_t size, unmapped;

 	if (unlikely(domain->ops->unmap == NULL))
 		return -ENODEV;
@@ -183,7 +183,9 @@ int iommu_unmap(struct iommu_domain *domain,
unsigned long iova, int gfp_order)

 	BUG_ON(!IS_ALIGNED(iova, size));

-	return domain->ops->unmap(domain, iova, gfp_order);
+	unmapped = domain->ops->unmap(domain, iova, size);
+
+	return get_order(unmapped);
 }
 EXPORT_SYMBOL_GPL(iommu_unmap);

diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index 5865dd2..13718d9 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -352,7 +352,7 @@ fail:
 }

 static int msm_iommu_map(struct iommu_domain *domain, unsigned long va,
-			 phys_addr_t pa, int order, int prot)
+			 phys_addr_t pa, size_t len, int prot)
 {
 	struct msm_priv *priv;
 	unsigned long flags;
@@ -363,7 +363,6 @@ static int msm_iommu_map(struct iommu_domain
*domain, unsigned long va,
 	unsigned long *sl_pte;
 	unsigned long sl_offset;
 	unsigned int pgprot;
-	size_t len = 0x1000UL << order;
 	int ret = 0, tex, sh;

 	spin_lock_irqsave(&msm_iommu_lock, flags);
@@ -463,8 +462,8 @@ fail:
 	return ret;
 }

-static int msm_iommu_unmap(struct iommu_domain *domain, unsigned long va,
-			    int order)
+static size_t msm_iommu_unmap(struct iommu_domain *domain, unsigned long va,
+			    size_t len)
 {
 	struct msm_priv *priv;
 	unsigned long flags;
@@ -474,7 +473,6 @@ static int msm_iommu_unmap(struct iommu_domain
*domain, unsigned long va,
 	unsigned long *sl_table;
 	unsigned long *sl_pte;
 	unsigned long sl_offset;
-	size_t len = 0x1000UL << order;
 	int i, ret = 0;

 	spin_lock_irqsave(&msm_iommu_lock, flags);
@@ -544,15 +542,12 @@ static int msm_iommu_unmap(struct iommu_domain
*domain, unsigned long va,

 	ret = __flush_iotlb(domain);

-	/*
-	 * the IOMMU API requires us to return the order of the unmapped
-	 * page (on success).
-	 */
-	if (!ret)
-		ret = order;
 fail:
 	spin_unlock_irqrestore(&msm_iommu_lock, flags);
-	return ret;
+
+	/* the IOMMU API requires us to return how many bytes were unmapped */
+	len = ret ? 0 : len;
+	return len;
 }

 static phys_addr_t msm_iommu_iova_to_phys(struct iommu_domain *domain,
diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index 9ee2c38..1bb2971 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -1019,12 +1019,11 @@ static void iopte_cachep_ctor(void *iopte)
 }

 static int omap_iommu_map(struct iommu_domain *domain, unsigned long da,
-			 phys_addr_t pa, int order, int prot)
+			 phys_addr_t pa, size_t bytes, int prot)
 {
 	struct omap_iommu_domain *omap_domain = domain->priv;
 	struct omap_iommu *oiommu = omap_domain->iommu_dev;
 	struct device *dev = oiommu->dev;
-	size_t bytes = PAGE_SIZE << order;
 	struct iotlb_entry e;
 	int omap_pgsz;
 	u32 ret, flags;
@@ -1049,19 +1048,16 @@ static int omap_iommu_map(struct iommu_domain
*domain, unsigned long da,
 	return ret;
 }

-static int omap_iommu_unmap(struct iommu_domain *domain, unsigned long da,
-			    int order)
+static size_t omap_iommu_unmap(struct iommu_domain *domain, unsigned long da,
+			    size_t size)
 {
 	struct omap_iommu_domain *omap_domain = domain->priv;
 	struct omap_iommu *oiommu = omap_domain->iommu_dev;
 	struct device *dev = oiommu->dev;
-	size_t unmap_size;

-	dev_dbg(dev, "unmapping da 0x%lx order %d\n", da, order);
+	dev_dbg(dev, "unmapping da 0x%lx size %u\n", da, size);

-	unmap_size = iopgtable_clear_entry(oiommu, da);
-
-	return unmap_size ? get_order(unmap_size) : -EINVAL;
+	return iopgtable_clear_entry(oiommu, da);
 }

 static int
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 710291f..6b6ed21 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -54,9 +54,9 @@ struct iommu_ops {
 	int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
 	void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
 	int (*map)(struct iommu_domain *domain, unsigned long iova,
-		   phys_addr_t paddr, int gfp_order, int prot);
-	int (*unmap)(struct iommu_domain *domain, unsigned long iova,
-		     int gfp_order);
+		   phys_addr_t paddr, size_t size, int prot);
+	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
+		     size_t size);
 	phys_addr_t (*iova_to_phys)(struct iommu_domain *domain,
 				    unsigned long iova);
 	int (*domain_has_cap)(struct iommu_domain *domain,
-- 
1.7.4.1


>From 1ddc6ad8ef8618f9a4d04bdc960c1b0f694beb49 Mon Sep 17 00:00:00 2001
From: Ohad Ben-Cohen <ohad@wizery.com>
Date: Mon, 10 Oct 2011 23:50:55 +0200
Subject: [PATCH 2/2] iommu/core: split mapping to page sizes as
supported by the hardware

When mapping a memory region, split it to page sizes as supported
by the iommu hardware. Always prefer bigger pages, when possible,
in order to reduce the TLB pressure.

The logic to do that is now added to the IOMMU core, so neither the iommu
drivers themselves nor users of the IOMMU API have to duplicate it.

This allows a more lenient granularity of mappings; traditionally the
IOMMU API took 'order' (of a page) as a mapping size, and directly let
the low level iommu drivers handle the mapping, but now that the IOMMU
core can split arbitrary memory regions into pages, we can remove this
limitation, so users don't have to split those regions by themselves.

Currently the supported page sizes are advertised once and they then
remain static. That works well for OMAP and MSM but it would probably
not fly well with intel's hardware, where the page size capabilities
seem to have the potential to be different between several DMA
remapping devices.

register_iommu() currently sets a default pgsize behavior, so we can convert
the IOMMU drivers in subsequent patches. After all the drivers
are converted, the temporary default settings will be removed.

Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted
to deal with bytes instead of page order.

Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review!

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: David Brown <davidb@codeaurora.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <Joerg.Roedel@amd.com>
Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
Cc: KyongHo Cho <pullip.cho@samsung.com>
Cc: Hiroshi DOYU <hdoyu@nvidia.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: kvm@vger.kernel.org
---
 drivers/iommu/iommu.c      |  112 +++++++++++++++++++++++++++++++++++++++----
 drivers/iommu/omap-iovmm.c |   17 ++----
 include/linux/iommu.h      |   24 ++++++++-
 virt/kvm/iommu.c           |    8 ++--
 4 files changed, 132 insertions(+), 29 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 5d042e8..9355632 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -16,6 +16,8 @@
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
  */

+#define pr_fmt(fmt)    "%s: " fmt, __func__
+
 #include <linux/device.h>
 #include <linux/kernel.h>
 #include <linux/bug.h>
@@ -47,6 +49,19 @@ int bus_set_iommu(struct bus_type *bus, struct
iommu_ops *ops)
 	if (bus->iommu_ops != NULL)
 		return -EBUSY;

+	/*
+	 * Set the default pgsize values, which retain the existing
+	 * IOMMU API behavior: drivers will be called to map
+	 * regions that are sized/aligned to order of 4KiB pages.
+	 *
+	 * This will be removed once all drivers are migrated.
+	 */
+	if (!ops->pgsize_bitmap)
+		ops->pgsize_bitmap = ~0xFFFUL;
+
+	/* find out the minimum page size only once */
+	ops->min_pagesz = 1 << __ffs(ops->pgsize_bitmap);
+
 	bus->iommu_ops = ops;

 	/* Do IOMMU specific setup for this bus-type */
@@ -157,35 +172,110 @@ int iommu_domain_has_cap(struct iommu_domain *domain,
 EXPORT_SYMBOL_GPL(iommu_domain_has_cap);

 int iommu_map(struct iommu_domain *domain, unsigned long iova,
-	      phys_addr_t paddr, int gfp_order, int prot)
+	      phys_addr_t paddr, size_t size, int prot)
 {
-	size_t size;
+	unsigned long orig_iova = iova;
+	size_t orig_size = size;
+	int ret = 0;

 	if (unlikely(domain->ops->map == NULL))
 		return -ENODEV;

-	size         = PAGE_SIZE << gfp_order;
+	/*
+	 * both the virtual address and the physical one, as well as
+	 * the size of the mapping, must be aligned (at least) to the
+	 * size of the smallest page supported by the hardware
+	 */
+	if (!IS_ALIGNED(iova | paddr | size, domain->ops->min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%x min_pagesz "
+			"0x%x\n", iova, (unsigned long)paddr,
+			size, domain->ops->min_pagesz);
+		return -EINVAL;
+	}
+
+	pr_debug("map: iova 0x%lx pa 0x%lx size 0x%x\n", iova,
+				(unsigned long)paddr, size);
+
+	while (size) {
+		unsigned long pgsize, addr_merge = iova | paddr;
+		unsigned int pgsize_idx;
+
+		/* Max page size that still fits into 'size' */
+		pgsize_idx = __fls(size);
+
+		/* need to consider alignment requirements ? */
+		if (likely(addr_merge)) {
+			/* Max page size allowed by both iova and paddr */
+			unsigned int align_pgsize_idx = __ffs(addr_merge);
+
+			pgsize_idx = min(pgsize_idx, align_pgsize_idx);
+		}
+
+		/* build a mask of acceptable page sizes */
+		pgsize = (1UL << (pgsize_idx + 1)) - 1;

-	BUG_ON(!IS_ALIGNED(iova | paddr, size));
+		/* throw away page sizes not supported by the hardware */
+		pgsize &= domain->ops->pgsize_bitmap;

-	return domain->ops->map(domain, iova, paddr, size, prot);
+		/* make sure we're still sane */
+		BUG_ON(!pgsize);
+
+		/* pick the biggest page */
+		pgsize_idx = __fls(pgsize);
+		pgsize = 1UL << pgsize_idx;
+
+		pr_debug("mapping: iova 0x%lx pa 0x%lx pgsize %lu\n", iova,
+					(unsigned long)paddr, pgsize);
+
+		ret = domain->ops->map(domain, iova, paddr, pgsize, prot);
+		if (ret)
+			break;
+
+		iova += pgsize;
+		paddr += pgsize;
+		size -= pgsize;
+	}
+
+	/* unroll mapping in case something went wrong */
+	if (ret)
+		iommu_unmap(domain, orig_iova, orig_size - size);
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_map);

-int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
+size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
size_t size)
 {
-	size_t size, unmapped;
+	size_t unmapped, left = size;

 	if (unlikely(domain->ops->unmap == NULL))
 		return -ENODEV;

-	size         = PAGE_SIZE << gfp_order;
+	/*
+	 * The virtual address, as well as the size of the mapping, must be
+	 * aligned (at least) to the size of the smallest page supported
+	 * by the hardware
+	 */
+	if (!IS_ALIGNED(iova | size, domain->ops->min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx size 0x%x min_pagesz 0x%x\n",
+			iova, size, domain->ops->min_pagesz);
+		return -EINVAL;
+	}
+
+	pr_debug("unmap this: iova 0x%lx size 0x%x\n", iova, size);
+
+	while (left) {
+		unmapped = domain->ops->unmap(domain, iova, left);
+		if (!unmapped)
+			break;

-	BUG_ON(!IS_ALIGNED(iova, size));
+		pr_debug("unmapped: iova 0x%lx size %u\n", iova, unmapped);

-	unmapped = domain->ops->unmap(domain, iova, size);
+		iova += unmapped;
+		left -= unmapped;
+	}

-	return get_order(unmapped);
+	return size - left;
 }
 EXPORT_SYMBOL_GPL(iommu_unmap);

diff --git a/drivers/iommu/omap-iovmm.c b/drivers/iommu/omap-iovmm.c
index e8fdb88..0b7b14c 100644
--- a/drivers/iommu/omap-iovmm.c
+++ b/drivers/iommu/omap-iovmm.c
@@ -409,7 +409,6 @@ static int map_iovm_area(struct iommu_domain
*domain, struct iovm_struct *new,
 	unsigned int i, j;
 	struct scatterlist *sg;
 	u32 da = new->da_start;
-	int order;

 	if (!domain || !sgt)
 		return -EINVAL;
@@ -428,12 +427,10 @@ static int map_iovm_area(struct iommu_domain
*domain, struct iovm_struct *new,
 		if (bytes_to_iopgsz(bytes) < 0)
 			goto err_out;

-		order = get_order(bytes);
-
 		pr_debug("%s: [%d] %08x %08x(%x)\n", __func__,
 			 i, da, pa, bytes);

-		err = iommu_map(domain, da, pa, order, flags);
+		err = iommu_map(domain, da, pa, bytes, flags);
 		if (err)
 			goto err_out;

@@ -448,10 +445,9 @@ err_out:
 		size_t bytes;

 		bytes = sg->length + sg->offset;
-		order = get_order(bytes);

 		/* ignore failures.. we're already handling one */
-		iommu_unmap(domain, da, order);
+		iommu_unmap(domain, da, bytes);

 		da += bytes;
 	}
@@ -466,7 +462,8 @@ static void unmap_iovm_area(struct iommu_domain
*domain, struct omap_iommu *obj,
 	size_t total = area->da_end - area->da_start;
 	const struct sg_table *sgt = area->sgt;
 	struct scatterlist *sg;
-	int i, err;
+	int i;
+	size_t unmapped;

 	BUG_ON(!sgtable_ok(sgt));
 	BUG_ON((!total) || !IS_ALIGNED(total, PAGE_SIZE));
@@ -474,13 +471,11 @@ static void unmap_iovm_area(struct iommu_domain
*domain, struct omap_iommu *obj,
 	start = area->da_start;
 	for_each_sg(sgt->sgl, sg, sgt->nents, i) {
 		size_t bytes;
-		int order;

 		bytes = sg->length + sg->offset;
-		order = get_order(bytes);

-		err = iommu_unmap(domain, start, order);
-		if (err < 0)
+		unmapped = iommu_unmap(domain, start, bytes);
+		if (unmapped < bytes)
 			break;

 		dev_dbg(obj->dev, "%s: unmap %08x(%x) %08x\n",
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 6b6ed21..76d7ce4 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -48,6 +48,22 @@ struct iommu_domain {

 #ifdef CONFIG_IOMMU_API

+/**
+ * struct iommu_ops - iommu ops and capabilities
+ * @domain_init: init iommu domain
+ * @domain_destroy: destroy iommu domain
+ * @attach_dev: attach device to an iommu domain
+ * @detach_dev: detach device from an iommu domain
+ * @map: map a physically contiguous memory region to an iommu domain
+ * @unmap: unmap a physically contiguous memory region from an iommu domain
+ * @iova_to_phys: translate iova to physical address
+ * @domain_has_cap: domain capabilities query
+ * @commit: commit iommu domain
+ * @pgsize_bitmap: bitmap of supported page sizes
+ * @min_pagesz: smallest page size supported. note: this member is private
+ *		to the IOMMU core, and maintained only for efficiency sake;
+ *		drivers don't need to set it.
+ */
 struct iommu_ops {
 	int (*domain_init)(struct iommu_domain *domain);
 	void (*domain_destroy)(struct iommu_domain *domain);
@@ -62,6 +78,8 @@ struct iommu_ops {
 	int (*domain_has_cap)(struct iommu_domain *domain,
 			      unsigned long cap);
 	void (*commit)(struct iommu_domain *domain);
+	unsigned long pgsize_bitmap;
+	unsigned int min_pagesz;
 };

 extern int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops);
@@ -73,9 +91,9 @@ extern int iommu_attach_device(struct iommu_domain *domain,
 extern void iommu_detach_device(struct iommu_domain *domain,
 				struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
-		     phys_addr_t paddr, int gfp_order, int prot);
-extern int iommu_unmap(struct iommu_domain *domain, unsigned long iova,
-		       int gfp_order);
+		     phys_addr_t paddr, size_t size, int prot);
+extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
+		       size_t size);
 extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
 				      unsigned long iova);
 extern int iommu_domain_has_cap(struct iommu_domain *domain,
diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index ca409be..35ed096 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -111,7 +111,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct
kvm_memory_slot *slot)

 		/* Map into IO address space */
 		r = iommu_map(domain, gfn_to_gpa(gfn), pfn_to_hpa(pfn),
-			      get_order(page_size), flags);
+			      page_size, flags);
 		if (r) {
 			printk(KERN_ERR "kvm_iommu_map_address:"
 			       "iommu failed to map pfn=%llx\n", pfn);
@@ -292,15 +292,15 @@ static void kvm_iommu_put_pages(struct kvm *kvm,

 	while (gfn < end_gfn) {
 		unsigned long unmap_pages;
-		int order;
+		size_t size;

 		/* Get physical address */
 		phys = iommu_iova_to_phys(domain, gfn_to_gpa(gfn));
 		pfn  = phys >> PAGE_SHIFT;

 		/* Unmap address from IO address space */
-		order       = iommu_unmap(domain, gfn_to_gpa(gfn), 0);
-		unmap_pages = 1ULL << order;
+		size       = iommu_unmap(domain, gfn_to_gpa(gfn), PAGE_SIZE);
+		unmap_pages = 1ULL << get_order(size);

 		/* Unpin all pages we just unmapped to not leak any memory */
 		kvm_unpin_pages(kvm, pfn, unmap_pages);
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-11 17:01                           ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-11 17:01 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: iommu, linux-omap, Laurent Pinchart, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Stepan Moskovchenko, kvm, Hiroshi Doyu

On Tue, Oct 11, 2011 at 12:38 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> You need to make sure that you don;t pass larger regions then requested
> to the low-level driver. Some logic like in the iommu_map function
> should do it.

You're right. But the solution IMHO should be getting rid of that size
-> order -> size conversion which we do back and forth, as it's kinda
pointless.

> Please use PAGE_SIZE instead of 0x1000UL.

This one would go away too if we remove the size -> order -> size
conversions. In fact, iommu_unmap() just becomes so much simpler when
we do that.

I'm attaching two patches: first one that removes the
size->order->size conversions we have, and then the usual pgsize patch
rebased to it.

Please take a look,

Thanks,
Ohad.

>From 7102bda53a425872591f14f850ab031b1ca5dae1 Mon Sep 17 00:00:00 2001
From: Ohad Ben-Cohen <ohad@wizery.com>
Date: Tue, 11 Oct 2011 16:50:38 +0200
Subject: [PATCH 1/2] iommu/core: stop converting bytes to page order
back and forth

Express sizes in bytes rather than in page order, to eliminate the
size->order->size conversions we have whenever the IOMMU API is calling
the low level drivers' map/unmap methods.

Adopt all existing drivers.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
---
 drivers/iommu/amd_iommu.c   |   13 +++++--------
 drivers/iommu/intel-iommu.c |   11 ++++-------
 drivers/iommu/iommu.c       |    8 +++++---
 drivers/iommu/msm_iommu.c   |   19 +++++++------------
 drivers/iommu/omap-iommu.c  |   14 +++++---------
 include/linux/iommu.h       |    6 +++---
 6 files changed, 29 insertions(+), 42 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 764e3da..32d502e 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2702,9 +2702,8 @@ static int amd_iommu_attach_device(struct
iommu_domain *dom,
 }

 static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova,
-			 phys_addr_t paddr, int gfp_order, int iommu_prot)
+			 phys_addr_t paddr, size_t page_size, int iommu_prot)
 {
-	unsigned long page_size = 0x1000UL << gfp_order;
 	struct protection_domain *domain = dom->priv;
 	int prot = 0;
 	int ret;
@@ -2721,13 +2720,11 @@ static int amd_iommu_map(struct iommu_domain
*dom, unsigned long iova,
 	return ret;
 }

-static int amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova,
-			   int gfp_order)
+static size_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova,
+			   size_t page_size)
 {
 	struct protection_domain *domain = dom->priv;
-	unsigned long page_size, unmap_size;
-
-	page_size  = 0x1000UL << gfp_order;
+	size_t unmap_size;

 	mutex_lock(&domain->api_lock);
 	unmap_size = iommu_unmap_page(domain, iova, page_size);
@@ -2735,7 +2732,7 @@ static int amd_iommu_unmap(struct iommu_domain
*dom, unsigned long iova,

 	domain_flush_tlb_pde(domain);

-	return get_order(unmap_size);
+	return unmap_size;
 }

 static phys_addr_t amd_iommu_iova_to_phys(struct iommu_domain *dom,
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 2d53c3d..96a9bd4 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3821,12 +3821,11 @@ static void intel_iommu_detach_device(struct
iommu_domain *domain,

 static int intel_iommu_map(struct iommu_domain *domain,
 			   unsigned long iova, phys_addr_t hpa,
-			   int gfp_order, int iommu_prot)
+			   size_t size, int iommu_prot)
 {
 	struct dmar_domain *dmar_domain = domain->priv;
 	u64 max_addr;
 	int prot = 0;
-	size_t size;
 	int ret;

 	if (iommu_prot & IOMMU_READ)
@@ -3836,7 +3835,6 @@ static int intel_iommu_map(struct iommu_domain *domain,
 	if ((iommu_prot & IOMMU_CACHE) && dmar_domain->iommu_snooping)
 		prot |= DMA_PTE_SNP;

-	size     = PAGE_SIZE << gfp_order;
 	max_addr = iova + size;
 	if (dmar_domain->max_addr < max_addr) {
 		u64 end;
@@ -3859,11 +3857,10 @@ static int intel_iommu_map(struct iommu_domain *domain,
 	return ret;
 }

-static int intel_iommu_unmap(struct iommu_domain *domain,
-			     unsigned long iova, int gfp_order)
+static size_t intel_iommu_unmap(struct iommu_domain *domain,
+			     unsigned long iova, size_t size)
 {
 	struct dmar_domain *dmar_domain = domain->priv;
-	size_t size = PAGE_SIZE << gfp_order;

 	dma_pte_clear_range(dmar_domain, iova >> VTD_PAGE_SHIFT,
 			    (iova + size - 1) >> VTD_PAGE_SHIFT);
@@ -3871,7 +3868,7 @@ static int intel_iommu_unmap(struct iommu_domain *domain,
 	if (dmar_domain->max_addr == iova + size)
 		dmar_domain->max_addr = iova;

-	return gfp_order;
+	return size;
 }

 static phys_addr_t intel_iommu_iova_to_phys(struct iommu_domain *domain,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 63ca9ba..5d042e8 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -168,13 +168,13 @@ int iommu_map(struct iommu_domain *domain,
unsigned long iova,

 	BUG_ON(!IS_ALIGNED(iova | paddr, size));

-	return domain->ops->map(domain, iova, paddr, gfp_order, prot);
+	return domain->ops->map(domain, iova, paddr, size, prot);
 }
 EXPORT_SYMBOL_GPL(iommu_map);

 int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
 {
-	size_t size;
+	size_t size, unmapped;

 	if (unlikely(domain->ops->unmap == NULL))
 		return -ENODEV;
@@ -183,7 +183,9 @@ int iommu_unmap(struct iommu_domain *domain,
unsigned long iova, int gfp_order)

 	BUG_ON(!IS_ALIGNED(iova, size));

-	return domain->ops->unmap(domain, iova, gfp_order);
+	unmapped = domain->ops->unmap(domain, iova, size);
+
+	return get_order(unmapped);
 }
 EXPORT_SYMBOL_GPL(iommu_unmap);

diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index 5865dd2..13718d9 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -352,7 +352,7 @@ fail:
 }

 static int msm_iommu_map(struct iommu_domain *domain, unsigned long va,
-			 phys_addr_t pa, int order, int prot)
+			 phys_addr_t pa, size_t len, int prot)
 {
 	struct msm_priv *priv;
 	unsigned long flags;
@@ -363,7 +363,6 @@ static int msm_iommu_map(struct iommu_domain
*domain, unsigned long va,
 	unsigned long *sl_pte;
 	unsigned long sl_offset;
 	unsigned int pgprot;
-	size_t len = 0x1000UL << order;
 	int ret = 0, tex, sh;

 	spin_lock_irqsave(&msm_iommu_lock, flags);
@@ -463,8 +462,8 @@ fail:
 	return ret;
 }

-static int msm_iommu_unmap(struct iommu_domain *domain, unsigned long va,
-			    int order)
+static size_t msm_iommu_unmap(struct iommu_domain *domain, unsigned long va,
+			    size_t len)
 {
 	struct msm_priv *priv;
 	unsigned long flags;
@@ -474,7 +473,6 @@ static int msm_iommu_unmap(struct iommu_domain
*domain, unsigned long va,
 	unsigned long *sl_table;
 	unsigned long *sl_pte;
 	unsigned long sl_offset;
-	size_t len = 0x1000UL << order;
 	int i, ret = 0;

 	spin_lock_irqsave(&msm_iommu_lock, flags);
@@ -544,15 +542,12 @@ static int msm_iommu_unmap(struct iommu_domain
*domain, unsigned long va,

 	ret = __flush_iotlb(domain);

-	/*
-	 * the IOMMU API requires us to return the order of the unmapped
-	 * page (on success).
-	 */
-	if (!ret)
-		ret = order;
 fail:
 	spin_unlock_irqrestore(&msm_iommu_lock, flags);
-	return ret;
+
+	/* the IOMMU API requires us to return how many bytes were unmapped */
+	len = ret ? 0 : len;
+	return len;
 }

 static phys_addr_t msm_iommu_iova_to_phys(struct iommu_domain *domain,
diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index 9ee2c38..1bb2971 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -1019,12 +1019,11 @@ static void iopte_cachep_ctor(void *iopte)
 }

 static int omap_iommu_map(struct iommu_domain *domain, unsigned long da,
-			 phys_addr_t pa, int order, int prot)
+			 phys_addr_t pa, size_t bytes, int prot)
 {
 	struct omap_iommu_domain *omap_domain = domain->priv;
 	struct omap_iommu *oiommu = omap_domain->iommu_dev;
 	struct device *dev = oiommu->dev;
-	size_t bytes = PAGE_SIZE << order;
 	struct iotlb_entry e;
 	int omap_pgsz;
 	u32 ret, flags;
@@ -1049,19 +1048,16 @@ static int omap_iommu_map(struct iommu_domain
*domain, unsigned long da,
 	return ret;
 }

-static int omap_iommu_unmap(struct iommu_domain *domain, unsigned long da,
-			    int order)
+static size_t omap_iommu_unmap(struct iommu_domain *domain, unsigned long da,
+			    size_t size)
 {
 	struct omap_iommu_domain *omap_domain = domain->priv;
 	struct omap_iommu *oiommu = omap_domain->iommu_dev;
 	struct device *dev = oiommu->dev;
-	size_t unmap_size;

-	dev_dbg(dev, "unmapping da 0x%lx order %d\n", da, order);
+	dev_dbg(dev, "unmapping da 0x%lx size %u\n", da, size);

-	unmap_size = iopgtable_clear_entry(oiommu, da);
-
-	return unmap_size ? get_order(unmap_size) : -EINVAL;
+	return iopgtable_clear_entry(oiommu, da);
 }

 static int
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 710291f..6b6ed21 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -54,9 +54,9 @@ struct iommu_ops {
 	int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
 	void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
 	int (*map)(struct iommu_domain *domain, unsigned long iova,
-		   phys_addr_t paddr, int gfp_order, int prot);
-	int (*unmap)(struct iommu_domain *domain, unsigned long iova,
-		     int gfp_order);
+		   phys_addr_t paddr, size_t size, int prot);
+	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
+		     size_t size);
 	phys_addr_t (*iova_to_phys)(struct iommu_domain *domain,
 				    unsigned long iova);
 	int (*domain_has_cap)(struct iommu_domain *domain,
-- 
1.7.4.1


>From 1ddc6ad8ef8618f9a4d04bdc960c1b0f694beb49 Mon Sep 17 00:00:00 2001
From: Ohad Ben-Cohen <ohad@wizery.com>
Date: Mon, 10 Oct 2011 23:50:55 +0200
Subject: [PATCH 2/2] iommu/core: split mapping to page sizes as
supported by the hardware

When mapping a memory region, split it to page sizes as supported
by the iommu hardware. Always prefer bigger pages, when possible,
in order to reduce the TLB pressure.

The logic to do that is now added to the IOMMU core, so neither the iommu
drivers themselves nor users of the IOMMU API have to duplicate it.

This allows a more lenient granularity of mappings; traditionally the
IOMMU API took 'order' (of a page) as a mapping size, and directly let
the low level iommu drivers handle the mapping, but now that the IOMMU
core can split arbitrary memory regions into pages, we can remove this
limitation, so users don't have to split those regions by themselves.

Currently the supported page sizes are advertised once and they then
remain static. That works well for OMAP and MSM but it would probably
not fly well with intel's hardware, where the page size capabilities
seem to have the potential to be different between several DMA
remapping devices.

register_iommu() currently sets a default pgsize behavior, so we can convert
the IOMMU drivers in subsequent patches. After all the drivers
are converted, the temporary default settings will be removed.

Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted
to deal with bytes instead of page order.

Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review!

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: David Brown <davidb@codeaurora.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <Joerg.Roedel@amd.com>
Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
Cc: KyongHo Cho <pullip.cho@samsung.com>
Cc: Hiroshi DOYU <hdoyu@nvidia.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: kvm@vger.kernel.org
---
 drivers/iommu/iommu.c      |  112 +++++++++++++++++++++++++++++++++++++++----
 drivers/iommu/omap-iovmm.c |   17 ++----
 include/linux/iommu.h      |   24 ++++++++-
 virt/kvm/iommu.c           |    8 ++--
 4 files changed, 132 insertions(+), 29 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 5d042e8..9355632 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -16,6 +16,8 @@
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
  */

+#define pr_fmt(fmt)    "%s: " fmt, __func__
+
 #include <linux/device.h>
 #include <linux/kernel.h>
 #include <linux/bug.h>
@@ -47,6 +49,19 @@ int bus_set_iommu(struct bus_type *bus, struct
iommu_ops *ops)
 	if (bus->iommu_ops != NULL)
 		return -EBUSY;

+	/*
+	 * Set the default pgsize values, which retain the existing
+	 * IOMMU API behavior: drivers will be called to map
+	 * regions that are sized/aligned to order of 4KiB pages.
+	 *
+	 * This will be removed once all drivers are migrated.
+	 */
+	if (!ops->pgsize_bitmap)
+		ops->pgsize_bitmap = ~0xFFFUL;
+
+	/* find out the minimum page size only once */
+	ops->min_pagesz = 1 << __ffs(ops->pgsize_bitmap);
+
 	bus->iommu_ops = ops;

 	/* Do IOMMU specific setup for this bus-type */
@@ -157,35 +172,110 @@ int iommu_domain_has_cap(struct iommu_domain *domain,
 EXPORT_SYMBOL_GPL(iommu_domain_has_cap);

 int iommu_map(struct iommu_domain *domain, unsigned long iova,
-	      phys_addr_t paddr, int gfp_order, int prot)
+	      phys_addr_t paddr, size_t size, int prot)
 {
-	size_t size;
+	unsigned long orig_iova = iova;
+	size_t orig_size = size;
+	int ret = 0;

 	if (unlikely(domain->ops->map == NULL))
 		return -ENODEV;

-	size         = PAGE_SIZE << gfp_order;
+	/*
+	 * both the virtual address and the physical one, as well as
+	 * the size of the mapping, must be aligned (at least) to the
+	 * size of the smallest page supported by the hardware
+	 */
+	if (!IS_ALIGNED(iova | paddr | size, domain->ops->min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%x min_pagesz "
+			"0x%x\n", iova, (unsigned long)paddr,
+			size, domain->ops->min_pagesz);
+		return -EINVAL;
+	}
+
+	pr_debug("map: iova 0x%lx pa 0x%lx size 0x%x\n", iova,
+				(unsigned long)paddr, size);
+
+	while (size) {
+		unsigned long pgsize, addr_merge = iova | paddr;
+		unsigned int pgsize_idx;
+
+		/* Max page size that still fits into 'size' */
+		pgsize_idx = __fls(size);
+
+		/* need to consider alignment requirements ? */
+		if (likely(addr_merge)) {
+			/* Max page size allowed by both iova and paddr */
+			unsigned int align_pgsize_idx = __ffs(addr_merge);
+
+			pgsize_idx = min(pgsize_idx, align_pgsize_idx);
+		}
+
+		/* build a mask of acceptable page sizes */
+		pgsize = (1UL << (pgsize_idx + 1)) - 1;

-	BUG_ON(!IS_ALIGNED(iova | paddr, size));
+		/* throw away page sizes not supported by the hardware */
+		pgsize &= domain->ops->pgsize_bitmap;

-	return domain->ops->map(domain, iova, paddr, size, prot);
+		/* make sure we're still sane */
+		BUG_ON(!pgsize);
+
+		/* pick the biggest page */
+		pgsize_idx = __fls(pgsize);
+		pgsize = 1UL << pgsize_idx;
+
+		pr_debug("mapping: iova 0x%lx pa 0x%lx pgsize %lu\n", iova,
+					(unsigned long)paddr, pgsize);
+
+		ret = domain->ops->map(domain, iova, paddr, pgsize, prot);
+		if (ret)
+			break;
+
+		iova += pgsize;
+		paddr += pgsize;
+		size -= pgsize;
+	}
+
+	/* unroll mapping in case something went wrong */
+	if (ret)
+		iommu_unmap(domain, orig_iova, orig_size - size);
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_map);

-int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
+size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
size_t size)
 {
-	size_t size, unmapped;
+	size_t unmapped, left = size;

 	if (unlikely(domain->ops->unmap == NULL))
 		return -ENODEV;

-	size         = PAGE_SIZE << gfp_order;
+	/*
+	 * The virtual address, as well as the size of the mapping, must be
+	 * aligned (at least) to the size of the smallest page supported
+	 * by the hardware
+	 */
+	if (!IS_ALIGNED(iova | size, domain->ops->min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx size 0x%x min_pagesz 0x%x\n",
+			iova, size, domain->ops->min_pagesz);
+		return -EINVAL;
+	}
+
+	pr_debug("unmap this: iova 0x%lx size 0x%x\n", iova, size);
+
+	while (left) {
+		unmapped = domain->ops->unmap(domain, iova, left);
+		if (!unmapped)
+			break;

-	BUG_ON(!IS_ALIGNED(iova, size));
+		pr_debug("unmapped: iova 0x%lx size %u\n", iova, unmapped);

-	unmapped = domain->ops->unmap(domain, iova, size);
+		iova += unmapped;
+		left -= unmapped;
+	}

-	return get_order(unmapped);
+	return size - left;
 }
 EXPORT_SYMBOL_GPL(iommu_unmap);

diff --git a/drivers/iommu/omap-iovmm.c b/drivers/iommu/omap-iovmm.c
index e8fdb88..0b7b14c 100644
--- a/drivers/iommu/omap-iovmm.c
+++ b/drivers/iommu/omap-iovmm.c
@@ -409,7 +409,6 @@ static int map_iovm_area(struct iommu_domain
*domain, struct iovm_struct *new,
 	unsigned int i, j;
 	struct scatterlist *sg;
 	u32 da = new->da_start;
-	int order;

 	if (!domain || !sgt)
 		return -EINVAL;
@@ -428,12 +427,10 @@ static int map_iovm_area(struct iommu_domain
*domain, struct iovm_struct *new,
 		if (bytes_to_iopgsz(bytes) < 0)
 			goto err_out;

-		order = get_order(bytes);
-
 		pr_debug("%s: [%d] %08x %08x(%x)\n", __func__,
 			 i, da, pa, bytes);

-		err = iommu_map(domain, da, pa, order, flags);
+		err = iommu_map(domain, da, pa, bytes, flags);
 		if (err)
 			goto err_out;

@@ -448,10 +445,9 @@ err_out:
 		size_t bytes;

 		bytes = sg->length + sg->offset;
-		order = get_order(bytes);

 		/* ignore failures.. we're already handling one */
-		iommu_unmap(domain, da, order);
+		iommu_unmap(domain, da, bytes);

 		da += bytes;
 	}
@@ -466,7 +462,8 @@ static void unmap_iovm_area(struct iommu_domain
*domain, struct omap_iommu *obj,
 	size_t total = area->da_end - area->da_start;
 	const struct sg_table *sgt = area->sgt;
 	struct scatterlist *sg;
-	int i, err;
+	int i;
+	size_t unmapped;

 	BUG_ON(!sgtable_ok(sgt));
 	BUG_ON((!total) || !IS_ALIGNED(total, PAGE_SIZE));
@@ -474,13 +471,11 @@ static void unmap_iovm_area(struct iommu_domain
*domain, struct omap_iommu *obj,
 	start = area->da_start;
 	for_each_sg(sgt->sgl, sg, sgt->nents, i) {
 		size_t bytes;
-		int order;

 		bytes = sg->length + sg->offset;
-		order = get_order(bytes);

-		err = iommu_unmap(domain, start, order);
-		if (err < 0)
+		unmapped = iommu_unmap(domain, start, bytes);
+		if (unmapped < bytes)
 			break;

 		dev_dbg(obj->dev, "%s: unmap %08x(%x) %08x\n",
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 6b6ed21..76d7ce4 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -48,6 +48,22 @@ struct iommu_domain {

 #ifdef CONFIG_IOMMU_API

+/**
+ * struct iommu_ops - iommu ops and capabilities
+ * @domain_init: init iommu domain
+ * @domain_destroy: destroy iommu domain
+ * @attach_dev: attach device to an iommu domain
+ * @detach_dev: detach device from an iommu domain
+ * @map: map a physically contiguous memory region to an iommu domain
+ * @unmap: unmap a physically contiguous memory region from an iommu domain
+ * @iova_to_phys: translate iova to physical address
+ * @domain_has_cap: domain capabilities query
+ * @commit: commit iommu domain
+ * @pgsize_bitmap: bitmap of supported page sizes
+ * @min_pagesz: smallest page size supported. note: this member is private
+ *		to the IOMMU core, and maintained only for efficiency sake;
+ *		drivers don't need to set it.
+ */
 struct iommu_ops {
 	int (*domain_init)(struct iommu_domain *domain);
 	void (*domain_destroy)(struct iommu_domain *domain);
@@ -62,6 +78,8 @@ struct iommu_ops {
 	int (*domain_has_cap)(struct iommu_domain *domain,
 			      unsigned long cap);
 	void (*commit)(struct iommu_domain *domain);
+	unsigned long pgsize_bitmap;
+	unsigned int min_pagesz;
 };

 extern int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops);
@@ -73,9 +91,9 @@ extern int iommu_attach_device(struct iommu_domain *domain,
 extern void iommu_detach_device(struct iommu_domain *domain,
 				struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
-		     phys_addr_t paddr, int gfp_order, int prot);
-extern int iommu_unmap(struct iommu_domain *domain, unsigned long iova,
-		       int gfp_order);
+		     phys_addr_t paddr, size_t size, int prot);
+extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
+		       size_t size);
 extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
 				      unsigned long iova);
 extern int iommu_domain_has_cap(struct iommu_domain *domain,
diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index ca409be..35ed096 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -111,7 +111,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct
kvm_memory_slot *slot)

 		/* Map into IO address space */
 		r = iommu_map(domain, gfn_to_gpa(gfn), pfn_to_hpa(pfn),
-			      get_order(page_size), flags);
+			      page_size, flags);
 		if (r) {
 			printk(KERN_ERR "kvm_iommu_map_address:"
 			       "iommu failed to map pfn=%llx\n", pfn);
@@ -292,15 +292,15 @@ static void kvm_iommu_put_pages(struct kvm *kvm,

 	while (gfn < end_gfn) {
 		unsigned long unmap_pages;
-		int order;
+		size_t size;

 		/* Get physical address */
 		phys = iommu_iova_to_phys(domain, gfn_to_gpa(gfn));
 		pfn  = phys >> PAGE_SHIFT;

 		/* Unmap address from IO address space */
-		order       = iommu_unmap(domain, gfn_to_gpa(gfn), 0);
-		unmap_pages = 1ULL << order;
+		size       = iommu_unmap(domain, gfn_to_gpa(gfn), PAGE_SIZE);
+		unmap_pages = 1ULL << get_order(size);

 		/* Unpin all pages we just unmapped to not leak any memory */
 		kvm_unpin_pages(kvm, pfn, unmap_pages);
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-11 17:01                           ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-11 17:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 11, 2011 at 12:38 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> You need to make sure that you don;t pass larger regions then requested
> to the low-level driver. Some logic like in the iommu_map function
> should do it.

You're right. But the solution IMHO should be getting rid of that size
-> order -> size conversion which we do back and forth, as it's kinda
pointless.

> Please use PAGE_SIZE instead of 0x1000UL.

This one would go away too if we remove the size -> order -> size
conversions. In fact, iommu_unmap() just becomes so much simpler when
we do that.

I'm attaching two patches: first one that removes the
size->order->size conversions we have, and then the usual pgsize patch
rebased to it.

Please take a look,

Thanks,
Ohad.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
  2011-10-11 14:59                         ` KyongHo Cho
  (?)
@ 2011-10-11 17:04                           ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-11 17:04 UTC (permalink / raw)
  To: KyongHo Cho
  Cc: Roedel, Joerg, iommu, linux-omap, Laurent Pinchart,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Stepan Moskovchenko, kvm, Hiroshi Doyu, Kukjin Kim

Hi KyongHo,

On Tue, Oct 11, 2011 at 4:59 PM, KyongHo Cho <pullip.cho@samsung.com> wrote:
> I think the type of size should be size_t.
..
> Thus, it should be
> iommu_unmap(domain, orig_iova, orig_size - size)

Agree with both suggestions, thanks !

P.S - please only keep the parts of my email which you are replying
to, otherwise it's a bit hard to find your comments.

Thanks,
Ohad.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-11 17:04                           ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-11 17:04 UTC (permalink / raw)
  To: KyongHo Cho
  Cc: Kukjin Kim, kvm, Arnd Bergmann, Roedel, Joerg, linux-kernel,
	iommu, Hiroshi Doyu, Laurent Pinchart, David Brown, linux-omap,
	David Woodhouse, linux-arm-kernel, Stepan Moskovchenko

Hi KyongHo,

On Tue, Oct 11, 2011 at 4:59 PM, KyongHo Cho <pullip.cho@samsung.com> wrote:
> I think the type of size should be size_t.
..
> Thus, it should be
> iommu_unmap(domain, orig_iova, orig_size - size)

Agree with both suggestions, thanks !

P.S - please only keep the parts of my email which you are replying
to, otherwise it's a bit hard to find your comments.

Thanks,
Ohad.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-11 17:04                           ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-11 17:04 UTC (permalink / raw)
  To: linux-arm-kernel

Hi KyongHo,

On Tue, Oct 11, 2011 at 4:59 PM, KyongHo Cho <pullip.cho@samsung.com> wrote:
> I think the type of size should be size_t.
..
> Thus, it should be
> iommu_unmap(domain, orig_iova, orig_size - size)

Agree with both suggestions, thanks !

P.S - please only keep the parts of my email which you are replying
to, otherwise it's a bit hard to find your comments.

Thanks,
Ohad.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
  2011-10-11 17:01                           ` Ohad Ben-Cohen
  (?)
@ 2011-10-14 13:35                             ` Roedel, Joerg
  -1 siblings, 0 replies; 106+ messages in thread
From: Roedel, Joerg @ 2011-10-14 13:35 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: iommu, linux-omap, Laurent Pinchart, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Stepan Moskovchenko, kvm, Hiroshi Doyu

On Tue, Oct 11, 2011 at 01:01:23PM -0400, Ohad Ben-Cohen wrote:
> On Tue, Oct 11, 2011 at 12:38 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> > You need to make sure that you don;t pass larger regions then requested
> > to the low-level driver. Some logic like in the iommu_map function
> > should do it.
> 
> You're right. But the solution IMHO should be getting rid of that size
> -> order -> size conversion which we do back and forth, as it's kinda
> pointless.

Right.

> 
> > Please use PAGE_SIZE instead of 0x1000UL.
> 
> This one would go away too if we remove the size -> order -> size
> conversions. In fact, iommu_unmap() just becomes so much simpler when
> we do that.
> 
> I'm attaching two patches: first one that removes the
> size->order->size conversions we have, and then the usual pgsize patch
> rebased to it.
> 
> Please take a look,
> 
> Thanks,
> Ohad.
> 
> >From 7102bda53a425872591f14f850ab031b1ca5dae1 Mon Sep 17 00:00:00 2001
> From: Ohad Ben-Cohen <ohad@wizery.com>
> Date: Tue, 11 Oct 2011 16:50:38 +0200
> Subject: [PATCH 1/2] iommu/core: stop converting bytes to page order
> back and forth
> 
> Express sizes in bytes rather than in page order, to eliminate the
> size->order->size conversions we have whenever the IOMMU API is calling
> the low level drivers' map/unmap methods.
> 
> Adopt all existing drivers.
> 
> Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
> ---
>  drivers/iommu/amd_iommu.c   |   13 +++++--------
>  drivers/iommu/intel-iommu.c |   11 ++++-------
>  drivers/iommu/iommu.c       |    8 +++++---
>  drivers/iommu/msm_iommu.c   |   19 +++++++------------
>  drivers/iommu/omap-iommu.c  |   14 +++++---------
>  include/linux/iommu.h       |    6 +++---
>  6 files changed, 29 insertions(+), 42 deletions(-)

This patch looks good. Please include it in the page-size patch-set.

>  drivers/iommu/iommu.c      |  112 +++++++++++++++++++++++++++++++++++++++----
>  drivers/iommu/omap-iovmm.c |   17 ++----
>  include/linux/iommu.h      |   24 ++++++++-
>  virt/kvm/iommu.c           |    8 ++--
>  4 files changed, 132 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 5d042e8..9355632 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -16,6 +16,8 @@
>   * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
>   */
> 
> +#define pr_fmt(fmt)    "%s: " fmt, __func__
> +
>  #include <linux/device.h>
>  #include <linux/kernel.h>
>  #include <linux/bug.h>
> @@ -47,6 +49,19 @@ int bus_set_iommu(struct bus_type *bus, struct
> iommu_ops *ops)
>         if (bus->iommu_ops != NULL)
>                 return -EBUSY;
> 
> +       /*
> +        * Set the default pgsize values, which retain the existing
> +        * IOMMU API behavior: drivers will be called to map
> +        * regions that are sized/aligned to order of 4KiB pages.
> +        *
> +        * This will be removed once all drivers are migrated.
> +        */
> +       if (!ops->pgsize_bitmap)
> +               ops->pgsize_bitmap = ~0xFFFUL;
> +
> +       /* find out the minimum page size only once */
> +       ops->min_pagesz = 1 << __ffs(ops->pgsize_bitmap);

Hmm, I'd like to constify the iommu_ops structures in the future. This
wouldn't work then anymore. How about renaming it to io_page_size
(because it is the base page-size of the iommu) and let the driver set
it?


> +
>         bus->iommu_ops = ops;
> 
>         /* Do IOMMU specific setup for this bus-type */
> @@ -157,35 +172,110 @@ int iommu_domain_has_cap(struct iommu_domain *domain,
>  EXPORT_SYMBOL_GPL(iommu_domain_has_cap);
> 
>  int iommu_map(struct iommu_domain *domain, unsigned long iova,
> -             phys_addr_t paddr, int gfp_order, int prot)
> +             phys_addr_t paddr, size_t size, int prot)
>  {
> -       size_t size;
> +       unsigned long orig_iova = iova;
> +       size_t orig_size = size;
> +       int ret = 0;
> 
>         if (unlikely(domain->ops->map == NULL))
>                 return -ENODEV;
> 
> -       size         = PAGE_SIZE << gfp_order;
> +       /*
> +        * both the virtual address and the physical one, as well as
> +        * the size of the mapping, must be aligned (at least) to the
> +        * size of the smallest page supported by the hardware
> +        */
> +       if (!IS_ALIGNED(iova | paddr | size, domain->ops->min_pagesz)) {
> +               pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%x min_pagesz "
> +                       "0x%x\n", iova, (unsigned long)paddr,
> +                       size, domain->ops->min_pagesz);
> +               return -EINVAL;
> +       }
> +
> +       pr_debug("map: iova 0x%lx pa 0x%lx size 0x%x\n", iova,
> +                               (unsigned long)paddr, size);
> +
> +       while (size) {
> +               unsigned long pgsize, addr_merge = iova | paddr;
> +               unsigned int pgsize_idx;
> +
> +               /* Max page size that still fits into 'size' */
> +               pgsize_idx = __fls(size);
> +
> +               /* need to consider alignment requirements ? */
> +               if (likely(addr_merge)) {
> +                       /* Max page size allowed by both iova and paddr */
> +                       unsigned int align_pgsize_idx = __ffs(addr_merge);
> +
> +                       pgsize_idx = min(pgsize_idx, align_pgsize_idx);
> +               }
> +
> +               /* build a mask of acceptable page sizes */
> +               pgsize = (1UL << (pgsize_idx + 1)) - 1;
> 
> -       BUG_ON(!IS_ALIGNED(iova | paddr, size));
> +               /* throw away page sizes not supported by the hardware */
> +               pgsize &= domain->ops->pgsize_bitmap;
> 
> -       return domain->ops->map(domain, iova, paddr, size, prot);
> +               /* make sure we're still sane */
> +               BUG_ON(!pgsize);
> +
> +               /* pick the biggest page */
> +               pgsize_idx = __fls(pgsize);
> +               pgsize = 1UL << pgsize_idx;
> +
> +               pr_debug("mapping: iova 0x%lx pa 0x%lx pgsize %lu\n", iova,
> +                                       (unsigned long)paddr, pgsize);
> +
> +               ret = domain->ops->map(domain, iova, paddr, pgsize, prot);
> +               if (ret)
> +                       break;
> +
> +               iova += pgsize;
> +               paddr += pgsize;
> +               size -= pgsize;
> +       }
> +
> +       /* unroll mapping in case something went wrong */
> +       if (ret)
> +               iommu_unmap(domain, orig_iova, orig_size - size);
> +
> +       return ret;
>  }
>  EXPORT_SYMBOL_GPL(iommu_map);
> 
> -int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
> +size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
> size_t size)
>  {
> -       size_t size, unmapped;
> +       size_t unmapped, left = size;
> 
>         if (unlikely(domain->ops->unmap == NULL))
>                 return -ENODEV;
> 
> -       size         = PAGE_SIZE << gfp_order;
> +       /*
> +        * The virtual address, as well as the size of the mapping, must be
> +        * aligned (at least) to the size of the smallest page supported
> +        * by the hardware
> +        */
> +       if (!IS_ALIGNED(iova | size, domain->ops->min_pagesz)) {
> +               pr_err("unaligned: iova 0x%lx size 0x%x min_pagesz 0x%x\n",
> +                       iova, size, domain->ops->min_pagesz);
> +               return -EINVAL;
> +       }
> +
> +       pr_debug("unmap this: iova 0x%lx size 0x%x\n", iova, size);
> +
> +       while (left) {

This needs to be (left > 0). The drivers are allowed to unmap more then
requested, so this value may turn negative.

> +               unmapped = domain->ops->unmap(domain, iova, left);
> +               if (!unmapped)
> +                       break;
> 
> -       BUG_ON(!IS_ALIGNED(iova, size));
> +               pr_debug("unmapped: iova 0x%lx size %u\n", iova, unmapped);
> 
> -       unmapped = domain->ops->unmap(domain, iova, size);
> +               iova += unmapped;
> +               left -= unmapped;
> +       }
> 
> -       return get_order(unmapped);
> +       return size - left;
>  }
>  EXPORT_SYMBOL_GPL(iommu_unmap);
> 
> diff --git a/drivers/iommu/omap-iovmm.c b/drivers/iommu/omap-iovmm.c
> index e8fdb88..0b7b14c 100644
> --- a/drivers/iommu/omap-iovmm.c
> +++ b/drivers/iommu/omap-iovmm.c
> @@ -409,7 +409,6 @@ static int map_iovm_area(struct iommu_domain
> *domain, struct iovm_struct *new,
>         unsigned int i, j;
>         struct scatterlist *sg;
>         u32 da = new->da_start;
> -       int order;
> 
>         if (!domain || !sgt)
>                 return -EINVAL;
> @@ -428,12 +427,10 @@ static int map_iovm_area(struct iommu_domain
> *domain, struct iovm_struct *new,
>                 if (bytes_to_iopgsz(bytes) < 0)
>                         goto err_out;
> 
> -               order = get_order(bytes);
> -
>                 pr_debug("%s: [%d] %08x %08x(%x)\n", __func__,
>                          i, da, pa, bytes);
> 
> -               err = iommu_map(domain, da, pa, order, flags);
> +               err = iommu_map(domain, da, pa, bytes, flags);
>                 if (err)
>                         goto err_out;
> 
> @@ -448,10 +445,9 @@ err_out:
>                 size_t bytes;
> 
>                 bytes = sg->length + sg->offset;
> -               order = get_order(bytes);
> 
>                 /* ignore failures.. we're already handling one */
> -               iommu_unmap(domain, da, order);
> +               iommu_unmap(domain, da, bytes);
> 
>                 da += bytes;
>         }
> @@ -466,7 +462,8 @@ static void unmap_iovm_area(struct iommu_domain
> *domain, struct omap_iommu *obj,
>         size_t total = area->da_end - area->da_start;
>         const struct sg_table *sgt = area->sgt;
>         struct scatterlist *sg;
> -       int i, err;
> +       int i;
> +       size_t unmapped;
> 
>         BUG_ON(!sgtable_ok(sgt));
>         BUG_ON((!total) || !IS_ALIGNED(total, PAGE_SIZE));
> @@ -474,13 +471,11 @@ static void unmap_iovm_area(struct iommu_domain
> *domain, struct omap_iommu *obj,
>         start = area->da_start;
>         for_each_sg(sgt->sgl, sg, sgt->nents, i) {
>                 size_t bytes;
> -               int order;
> 
>                 bytes = sg->length + sg->offset;
> -               order = get_order(bytes);
> 
> -               err = iommu_unmap(domain, start, order);
> -               if (err < 0)
> +               unmapped = iommu_unmap(domain, start, bytes);
> +               if (unmapped < bytes)
>                         break;
> 
>                 dev_dbg(obj->dev, "%s: unmap %08x(%x) %08x\n",
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 6b6ed21..76d7ce4 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -48,6 +48,22 @@ struct iommu_domain {
> 
>  #ifdef CONFIG_IOMMU_API
> 
> +/**
> + * struct iommu_ops - iommu ops and capabilities
> + * @domain_init: init iommu domain
> + * @domain_destroy: destroy iommu domain
> + * @attach_dev: attach device to an iommu domain
> + * @detach_dev: detach device from an iommu domain
> + * @map: map a physically contiguous memory region to an iommu domain
> + * @unmap: unmap a physically contiguous memory region from an iommu domain
> + * @iova_to_phys: translate iova to physical address
> + * @domain_has_cap: domain capabilities query
> + * @commit: commit iommu domain
> + * @pgsize_bitmap: bitmap of supported page sizes
> + * @min_pagesz: smallest page size supported. note: this member is private
> + *             to the IOMMU core, and maintained only for efficiency sake;
> + *             drivers don't need to set it.
> + */
>  struct iommu_ops {
>         int (*domain_init)(struct iommu_domain *domain);
>         void (*domain_destroy)(struct iommu_domain *domain);
> @@ -62,6 +78,8 @@ struct iommu_ops {
>         int (*domain_has_cap)(struct iommu_domain *domain,
>                               unsigned long cap);
>         void (*commit)(struct iommu_domain *domain);
> +       unsigned long pgsize_bitmap;
> +       unsigned int min_pagesz;
>  };
> 
>  extern int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops);
> @@ -73,9 +91,9 @@ extern int iommu_attach_device(struct iommu_domain *domain,
>  extern void iommu_detach_device(struct iommu_domain *domain,
>                                 struct device *dev);
>  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
> -                    phys_addr_t paddr, int gfp_order, int prot);
> -extern int iommu_unmap(struct iommu_domain *domain, unsigned long iova,
> -                      int gfp_order);
> +                    phys_addr_t paddr, size_t size, int prot);
> +extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
> +                      size_t size);
>  extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
>                                       unsigned long iova);
>  extern int iommu_domain_has_cap(struct iommu_domain *domain,
> diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
> index ca409be..35ed096 100644
> --- a/virt/kvm/iommu.c
> +++ b/virt/kvm/iommu.c
> @@ -111,7 +111,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct
> kvm_memory_slot *slot)
> 
>                 /* Map into IO address space */
>                 r = iommu_map(domain, gfn_to_gpa(gfn), pfn_to_hpa(pfn),
> -                             get_order(page_size), flags);
> +                             page_size, flags);
>                 if (r) {
>                         printk(KERN_ERR "kvm_iommu_map_address:"
>                                "iommu failed to map pfn=%llx\n", pfn);
> @@ -292,15 +292,15 @@ static void kvm_iommu_put_pages(struct kvm *kvm,
> 
>         while (gfn < end_gfn) {
>                 unsigned long unmap_pages;
> -               int order;
> +               size_t size;
> 
>                 /* Get physical address */
>                 phys = iommu_iova_to_phys(domain, gfn_to_gpa(gfn));
>                 pfn  = phys >> PAGE_SHIFT;
> 
>                 /* Unmap address from IO address space */
> -               order       = iommu_unmap(domain, gfn_to_gpa(gfn), 0);
> -               unmap_pages = 1ULL << order;
> +               size       = iommu_unmap(domain, gfn_to_gpa(gfn), PAGE_SIZE);
> +               unmap_pages = 1ULL << get_order(size);
> 
>                 /* Unpin all pages we just unmapped to not leak any memory */
>                 kvm_unpin_pages(kvm, pfn, unmap_pages);
> --
> 1.7.4.1

Btw, how have you tested this code?

Regards,

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-14 13:35                             ` Roedel, Joerg
  0 siblings, 0 replies; 106+ messages in thread
From: Roedel, Joerg @ 2011-10-14 13:35 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: iommu, linux-omap, Laurent Pinchart, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Stepan Moskovchenko, kvm, Hiroshi Doyu

On Tue, Oct 11, 2011 at 01:01:23PM -0400, Ohad Ben-Cohen wrote:
> On Tue, Oct 11, 2011 at 12:38 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> > You need to make sure that you don;t pass larger regions then requested
> > to the low-level driver. Some logic like in the iommu_map function
> > should do it.
> 
> You're right. But the solution IMHO should be getting rid of that size
> -> order -> size conversion which we do back and forth, as it's kinda
> pointless.

Right.

> 
> > Please use PAGE_SIZE instead of 0x1000UL.
> 
> This one would go away too if we remove the size -> order -> size
> conversions. In fact, iommu_unmap() just becomes so much simpler when
> we do that.
> 
> I'm attaching two patches: first one that removes the
> size->order->size conversions we have, and then the usual pgsize patch
> rebased to it.
> 
> Please take a look,
> 
> Thanks,
> Ohad.
> 
> >From 7102bda53a425872591f14f850ab031b1ca5dae1 Mon Sep 17 00:00:00 2001
> From: Ohad Ben-Cohen <ohad@wizery.com>
> Date: Tue, 11 Oct 2011 16:50:38 +0200
> Subject: [PATCH 1/2] iommu/core: stop converting bytes to page order
> back and forth
> 
> Express sizes in bytes rather than in page order, to eliminate the
> size->order->size conversions we have whenever the IOMMU API is calling
> the low level drivers' map/unmap methods.
> 
> Adopt all existing drivers.
> 
> Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
> ---
>  drivers/iommu/amd_iommu.c   |   13 +++++--------
>  drivers/iommu/intel-iommu.c |   11 ++++-------
>  drivers/iommu/iommu.c       |    8 +++++---
>  drivers/iommu/msm_iommu.c   |   19 +++++++------------
>  drivers/iommu/omap-iommu.c  |   14 +++++---------
>  include/linux/iommu.h       |    6 +++---
>  6 files changed, 29 insertions(+), 42 deletions(-)

This patch looks good. Please include it in the page-size patch-set.

>  drivers/iommu/iommu.c      |  112 +++++++++++++++++++++++++++++++++++++++----
>  drivers/iommu/omap-iovmm.c |   17 ++----
>  include/linux/iommu.h      |   24 ++++++++-
>  virt/kvm/iommu.c           |    8 ++--
>  4 files changed, 132 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 5d042e8..9355632 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -16,6 +16,8 @@
>   * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
>   */
> 
> +#define pr_fmt(fmt)    "%s: " fmt, __func__
> +
>  #include <linux/device.h>
>  #include <linux/kernel.h>
>  #include <linux/bug.h>
> @@ -47,6 +49,19 @@ int bus_set_iommu(struct bus_type *bus, struct
> iommu_ops *ops)
>         if (bus->iommu_ops != NULL)
>                 return -EBUSY;
> 
> +       /*
> +        * Set the default pgsize values, which retain the existing
> +        * IOMMU API behavior: drivers will be called to map
> +        * regions that are sized/aligned to order of 4KiB pages.
> +        *
> +        * This will be removed once all drivers are migrated.
> +        */
> +       if (!ops->pgsize_bitmap)
> +               ops->pgsize_bitmap = ~0xFFFUL;
> +
> +       /* find out the minimum page size only once */
> +       ops->min_pagesz = 1 << __ffs(ops->pgsize_bitmap);

Hmm, I'd like to constify the iommu_ops structures in the future. This
wouldn't work then anymore. How about renaming it to io_page_size
(because it is the base page-size of the iommu) and let the driver set
it?


> +
>         bus->iommu_ops = ops;
> 
>         /* Do IOMMU specific setup for this bus-type */
> @@ -157,35 +172,110 @@ int iommu_domain_has_cap(struct iommu_domain *domain,
>  EXPORT_SYMBOL_GPL(iommu_domain_has_cap);
> 
>  int iommu_map(struct iommu_domain *domain, unsigned long iova,
> -             phys_addr_t paddr, int gfp_order, int prot)
> +             phys_addr_t paddr, size_t size, int prot)
>  {
> -       size_t size;
> +       unsigned long orig_iova = iova;
> +       size_t orig_size = size;
> +       int ret = 0;
> 
>         if (unlikely(domain->ops->map == NULL))
>                 return -ENODEV;
> 
> -       size         = PAGE_SIZE << gfp_order;
> +       /*
> +        * both the virtual address and the physical one, as well as
> +        * the size of the mapping, must be aligned (at least) to the
> +        * size of the smallest page supported by the hardware
> +        */
> +       if (!IS_ALIGNED(iova | paddr | size, domain->ops->min_pagesz)) {
> +               pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%x min_pagesz "
> +                       "0x%x\n", iova, (unsigned long)paddr,
> +                       size, domain->ops->min_pagesz);
> +               return -EINVAL;
> +       }
> +
> +       pr_debug("map: iova 0x%lx pa 0x%lx size 0x%x\n", iova,
> +                               (unsigned long)paddr, size);
> +
> +       while (size) {
> +               unsigned long pgsize, addr_merge = iova | paddr;
> +               unsigned int pgsize_idx;
> +
> +               /* Max page size that still fits into 'size' */
> +               pgsize_idx = __fls(size);
> +
> +               /* need to consider alignment requirements ? */
> +               if (likely(addr_merge)) {
> +                       /* Max page size allowed by both iova and paddr */
> +                       unsigned int align_pgsize_idx = __ffs(addr_merge);
> +
> +                       pgsize_idx = min(pgsize_idx, align_pgsize_idx);
> +               }
> +
> +               /* build a mask of acceptable page sizes */
> +               pgsize = (1UL << (pgsize_idx + 1)) - 1;
> 
> -       BUG_ON(!IS_ALIGNED(iova | paddr, size));
> +               /* throw away page sizes not supported by the hardware */
> +               pgsize &= domain->ops->pgsize_bitmap;
> 
> -       return domain->ops->map(domain, iova, paddr, size, prot);
> +               /* make sure we're still sane */
> +               BUG_ON(!pgsize);
> +
> +               /* pick the biggest page */
> +               pgsize_idx = __fls(pgsize);
> +               pgsize = 1UL << pgsize_idx;
> +
> +               pr_debug("mapping: iova 0x%lx pa 0x%lx pgsize %lu\n", iova,
> +                                       (unsigned long)paddr, pgsize);
> +
> +               ret = domain->ops->map(domain, iova, paddr, pgsize, prot);
> +               if (ret)
> +                       break;
> +
> +               iova += pgsize;
> +               paddr += pgsize;
> +               size -= pgsize;
> +       }
> +
> +       /* unroll mapping in case something went wrong */
> +       if (ret)
> +               iommu_unmap(domain, orig_iova, orig_size - size);
> +
> +       return ret;
>  }
>  EXPORT_SYMBOL_GPL(iommu_map);
> 
> -int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
> +size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
> size_t size)
>  {
> -       size_t size, unmapped;
> +       size_t unmapped, left = size;
> 
>         if (unlikely(domain->ops->unmap == NULL))
>                 return -ENODEV;
> 
> -       size         = PAGE_SIZE << gfp_order;
> +       /*
> +        * The virtual address, as well as the size of the mapping, must be
> +        * aligned (at least) to the size of the smallest page supported
> +        * by the hardware
> +        */
> +       if (!IS_ALIGNED(iova | size, domain->ops->min_pagesz)) {
> +               pr_err("unaligned: iova 0x%lx size 0x%x min_pagesz 0x%x\n",
> +                       iova, size, domain->ops->min_pagesz);
> +               return -EINVAL;
> +       }
> +
> +       pr_debug("unmap this: iova 0x%lx size 0x%x\n", iova, size);
> +
> +       while (left) {

This needs to be (left > 0). The drivers are allowed to unmap more then
requested, so this value may turn negative.

> +               unmapped = domain->ops->unmap(domain, iova, left);
> +               if (!unmapped)
> +                       break;
> 
> -       BUG_ON(!IS_ALIGNED(iova, size));
> +               pr_debug("unmapped: iova 0x%lx size %u\n", iova, unmapped);
> 
> -       unmapped = domain->ops->unmap(domain, iova, size);
> +               iova += unmapped;
> +               left -= unmapped;
> +       }
> 
> -       return get_order(unmapped);
> +       return size - left;
>  }
>  EXPORT_SYMBOL_GPL(iommu_unmap);
> 
> diff --git a/drivers/iommu/omap-iovmm.c b/drivers/iommu/omap-iovmm.c
> index e8fdb88..0b7b14c 100644
> --- a/drivers/iommu/omap-iovmm.c
> +++ b/drivers/iommu/omap-iovmm.c
> @@ -409,7 +409,6 @@ static int map_iovm_area(struct iommu_domain
> *domain, struct iovm_struct *new,
>         unsigned int i, j;
>         struct scatterlist *sg;
>         u32 da = new->da_start;
> -       int order;
> 
>         if (!domain || !sgt)
>                 return -EINVAL;
> @@ -428,12 +427,10 @@ static int map_iovm_area(struct iommu_domain
> *domain, struct iovm_struct *new,
>                 if (bytes_to_iopgsz(bytes) < 0)
>                         goto err_out;
> 
> -               order = get_order(bytes);
> -
>                 pr_debug("%s: [%d] %08x %08x(%x)\n", __func__,
>                          i, da, pa, bytes);
> 
> -               err = iommu_map(domain, da, pa, order, flags);
> +               err = iommu_map(domain, da, pa, bytes, flags);
>                 if (err)
>                         goto err_out;
> 
> @@ -448,10 +445,9 @@ err_out:
>                 size_t bytes;
> 
>                 bytes = sg->length + sg->offset;
> -               order = get_order(bytes);
> 
>                 /* ignore failures.. we're already handling one */
> -               iommu_unmap(domain, da, order);
> +               iommu_unmap(domain, da, bytes);
> 
>                 da += bytes;
>         }
> @@ -466,7 +462,8 @@ static void unmap_iovm_area(struct iommu_domain
> *domain, struct omap_iommu *obj,
>         size_t total = area->da_end - area->da_start;
>         const struct sg_table *sgt = area->sgt;
>         struct scatterlist *sg;
> -       int i, err;
> +       int i;
> +       size_t unmapped;
> 
>         BUG_ON(!sgtable_ok(sgt));
>         BUG_ON((!total) || !IS_ALIGNED(total, PAGE_SIZE));
> @@ -474,13 +471,11 @@ static void unmap_iovm_area(struct iommu_domain
> *domain, struct omap_iommu *obj,
>         start = area->da_start;
>         for_each_sg(sgt->sgl, sg, sgt->nents, i) {
>                 size_t bytes;
> -               int order;
> 
>                 bytes = sg->length + sg->offset;
> -               order = get_order(bytes);
> 
> -               err = iommu_unmap(domain, start, order);
> -               if (err < 0)
> +               unmapped = iommu_unmap(domain, start, bytes);
> +               if (unmapped < bytes)
>                         break;
> 
>                 dev_dbg(obj->dev, "%s: unmap %08x(%x) %08x\n",
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 6b6ed21..76d7ce4 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -48,6 +48,22 @@ struct iommu_domain {
> 
>  #ifdef CONFIG_IOMMU_API
> 
> +/**
> + * struct iommu_ops - iommu ops and capabilities
> + * @domain_init: init iommu domain
> + * @domain_destroy: destroy iommu domain
> + * @attach_dev: attach device to an iommu domain
> + * @detach_dev: detach device from an iommu domain
> + * @map: map a physically contiguous memory region to an iommu domain
> + * @unmap: unmap a physically contiguous memory region from an iommu domain
> + * @iova_to_phys: translate iova to physical address
> + * @domain_has_cap: domain capabilities query
> + * @commit: commit iommu domain
> + * @pgsize_bitmap: bitmap of supported page sizes
> + * @min_pagesz: smallest page size supported. note: this member is private
> + *             to the IOMMU core, and maintained only for efficiency sake;
> + *             drivers don't need to set it.
> + */
>  struct iommu_ops {
>         int (*domain_init)(struct iommu_domain *domain);
>         void (*domain_destroy)(struct iommu_domain *domain);
> @@ -62,6 +78,8 @@ struct iommu_ops {
>         int (*domain_has_cap)(struct iommu_domain *domain,
>                               unsigned long cap);
>         void (*commit)(struct iommu_domain *domain);
> +       unsigned long pgsize_bitmap;
> +       unsigned int min_pagesz;
>  };
> 
>  extern int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops);
> @@ -73,9 +91,9 @@ extern int iommu_attach_device(struct iommu_domain *domain,
>  extern void iommu_detach_device(struct iommu_domain *domain,
>                                 struct device *dev);
>  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
> -                    phys_addr_t paddr, int gfp_order, int prot);
> -extern int iommu_unmap(struct iommu_domain *domain, unsigned long iova,
> -                      int gfp_order);
> +                    phys_addr_t paddr, size_t size, int prot);
> +extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
> +                      size_t size);
>  extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
>                                       unsigned long iova);
>  extern int iommu_domain_has_cap(struct iommu_domain *domain,
> diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
> index ca409be..35ed096 100644
> --- a/virt/kvm/iommu.c
> +++ b/virt/kvm/iommu.c
> @@ -111,7 +111,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct
> kvm_memory_slot *slot)
> 
>                 /* Map into IO address space */
>                 r = iommu_map(domain, gfn_to_gpa(gfn), pfn_to_hpa(pfn),
> -                             get_order(page_size), flags);
> +                             page_size, flags);
>                 if (r) {
>                         printk(KERN_ERR "kvm_iommu_map_address:"
>                                "iommu failed to map pfn=%llx\n", pfn);
> @@ -292,15 +292,15 @@ static void kvm_iommu_put_pages(struct kvm *kvm,
> 
>         while (gfn < end_gfn) {
>                 unsigned long unmap_pages;
> -               int order;
> +               size_t size;
> 
>                 /* Get physical address */
>                 phys = iommu_iova_to_phys(domain, gfn_to_gpa(gfn));
>                 pfn  = phys >> PAGE_SHIFT;
> 
>                 /* Unmap address from IO address space */
> -               order       = iommu_unmap(domain, gfn_to_gpa(gfn), 0);
> -               unmap_pages = 1ULL << order;
> +               size       = iommu_unmap(domain, gfn_to_gpa(gfn), PAGE_SIZE);
> +               unmap_pages = 1ULL << get_order(size);
> 
>                 /* Unpin all pages we just unmapped to not leak any memory */
>                 kvm_unpin_pages(kvm, pfn, unmap_pages);
> --
> 1.7.4.1

Btw, how have you tested this code?

Regards,

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-14 13:35                             ` Roedel, Joerg
  0 siblings, 0 replies; 106+ messages in thread
From: Roedel, Joerg @ 2011-10-14 13:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 11, 2011 at 01:01:23PM -0400, Ohad Ben-Cohen wrote:
> On Tue, Oct 11, 2011 at 12:38 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> > You need to make sure that you don;t pass larger regions then requested
> > to the low-level driver. Some logic like in the iommu_map function
> > should do it.
> 
> You're right. But the solution IMHO should be getting rid of that size
> -> order -> size conversion which we do back and forth, as it's kinda
> pointless.

Right.

> 
> > Please use PAGE_SIZE instead of 0x1000UL.
> 
> This one would go away too if we remove the size -> order -> size
> conversions. In fact, iommu_unmap() just becomes so much simpler when
> we do that.
> 
> I'm attaching two patches: first one that removes the
> size->order->size conversions we have, and then the usual pgsize patch
> rebased to it.
> 
> Please take a look,
> 
> Thanks,
> Ohad.
> 
> >From 7102bda53a425872591f14f850ab031b1ca5dae1 Mon Sep 17 00:00:00 2001
> From: Ohad Ben-Cohen <ohad@wizery.com>
> Date: Tue, 11 Oct 2011 16:50:38 +0200
> Subject: [PATCH 1/2] iommu/core: stop converting bytes to page order
> back and forth
> 
> Express sizes in bytes rather than in page order, to eliminate the
> size->order->size conversions we have whenever the IOMMU API is calling
> the low level drivers' map/unmap methods.
> 
> Adopt all existing drivers.
> 
> Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
> ---
>  drivers/iommu/amd_iommu.c   |   13 +++++--------
>  drivers/iommu/intel-iommu.c |   11 ++++-------
>  drivers/iommu/iommu.c       |    8 +++++---
>  drivers/iommu/msm_iommu.c   |   19 +++++++------------
>  drivers/iommu/omap-iommu.c  |   14 +++++---------
>  include/linux/iommu.h       |    6 +++---
>  6 files changed, 29 insertions(+), 42 deletions(-)

This patch looks good. Please include it in the page-size patch-set.

>  drivers/iommu/iommu.c      |  112 +++++++++++++++++++++++++++++++++++++++----
>  drivers/iommu/omap-iovmm.c |   17 ++----
>  include/linux/iommu.h      |   24 ++++++++-
>  virt/kvm/iommu.c           |    8 ++--
>  4 files changed, 132 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 5d042e8..9355632 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -16,6 +16,8 @@
>   * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
>   */
> 
> +#define pr_fmt(fmt)    "%s: " fmt, __func__
> +
>  #include <linux/device.h>
>  #include <linux/kernel.h>
>  #include <linux/bug.h>
> @@ -47,6 +49,19 @@ int bus_set_iommu(struct bus_type *bus, struct
> iommu_ops *ops)
>         if (bus->iommu_ops != NULL)
>                 return -EBUSY;
> 
> +       /*
> +        * Set the default pgsize values, which retain the existing
> +        * IOMMU API behavior: drivers will be called to map
> +        * regions that are sized/aligned to order of 4KiB pages.
> +        *
> +        * This will be removed once all drivers are migrated.
> +        */
> +       if (!ops->pgsize_bitmap)
> +               ops->pgsize_bitmap = ~0xFFFUL;
> +
> +       /* find out the minimum page size only once */
> +       ops->min_pagesz = 1 << __ffs(ops->pgsize_bitmap);

Hmm, I'd like to constify the iommu_ops structures in the future. This
wouldn't work then anymore. How about renaming it to io_page_size
(because it is the base page-size of the iommu) and let the driver set
it?


> +
>         bus->iommu_ops = ops;
> 
>         /* Do IOMMU specific setup for this bus-type */
> @@ -157,35 +172,110 @@ int iommu_domain_has_cap(struct iommu_domain *domain,
>  EXPORT_SYMBOL_GPL(iommu_domain_has_cap);
> 
>  int iommu_map(struct iommu_domain *domain, unsigned long iova,
> -             phys_addr_t paddr, int gfp_order, int prot)
> +             phys_addr_t paddr, size_t size, int prot)
>  {
> -       size_t size;
> +       unsigned long orig_iova = iova;
> +       size_t orig_size = size;
> +       int ret = 0;
> 
>         if (unlikely(domain->ops->map == NULL))
>                 return -ENODEV;
> 
> -       size         = PAGE_SIZE << gfp_order;
> +       /*
> +        * both the virtual address and the physical one, as well as
> +        * the size of the mapping, must be aligned (at least) to the
> +        * size of the smallest page supported by the hardware
> +        */
> +       if (!IS_ALIGNED(iova | paddr | size, domain->ops->min_pagesz)) {
> +               pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%x min_pagesz "
> +                       "0x%x\n", iova, (unsigned long)paddr,
> +                       size, domain->ops->min_pagesz);
> +               return -EINVAL;
> +       }
> +
> +       pr_debug("map: iova 0x%lx pa 0x%lx size 0x%x\n", iova,
> +                               (unsigned long)paddr, size);
> +
> +       while (size) {
> +               unsigned long pgsize, addr_merge = iova | paddr;
> +               unsigned int pgsize_idx;
> +
> +               /* Max page size that still fits into 'size' */
> +               pgsize_idx = __fls(size);
> +
> +               /* need to consider alignment requirements ? */
> +               if (likely(addr_merge)) {
> +                       /* Max page size allowed by both iova and paddr */
> +                       unsigned int align_pgsize_idx = __ffs(addr_merge);
> +
> +                       pgsize_idx = min(pgsize_idx, align_pgsize_idx);
> +               }
> +
> +               /* build a mask of acceptable page sizes */
> +               pgsize = (1UL << (pgsize_idx + 1)) - 1;
> 
> -       BUG_ON(!IS_ALIGNED(iova | paddr, size));
> +               /* throw away page sizes not supported by the hardware */
> +               pgsize &= domain->ops->pgsize_bitmap;
> 
> -       return domain->ops->map(domain, iova, paddr, size, prot);
> +               /* make sure we're still sane */
> +               BUG_ON(!pgsize);
> +
> +               /* pick the biggest page */
> +               pgsize_idx = __fls(pgsize);
> +               pgsize = 1UL << pgsize_idx;
> +
> +               pr_debug("mapping: iova 0x%lx pa 0x%lx pgsize %lu\n", iova,
> +                                       (unsigned long)paddr, pgsize);
> +
> +               ret = domain->ops->map(domain, iova, paddr, pgsize, prot);
> +               if (ret)
> +                       break;
> +
> +               iova += pgsize;
> +               paddr += pgsize;
> +               size -= pgsize;
> +       }
> +
> +       /* unroll mapping in case something went wrong */
> +       if (ret)
> +               iommu_unmap(domain, orig_iova, orig_size - size);
> +
> +       return ret;
>  }
>  EXPORT_SYMBOL_GPL(iommu_map);
> 
> -int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
> +size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
> size_t size)
>  {
> -       size_t size, unmapped;
> +       size_t unmapped, left = size;
> 
>         if (unlikely(domain->ops->unmap == NULL))
>                 return -ENODEV;
> 
> -       size         = PAGE_SIZE << gfp_order;
> +       /*
> +        * The virtual address, as well as the size of the mapping, must be
> +        * aligned (at least) to the size of the smallest page supported
> +        * by the hardware
> +        */
> +       if (!IS_ALIGNED(iova | size, domain->ops->min_pagesz)) {
> +               pr_err("unaligned: iova 0x%lx size 0x%x min_pagesz 0x%x\n",
> +                       iova, size, domain->ops->min_pagesz);
> +               return -EINVAL;
> +       }
> +
> +       pr_debug("unmap this: iova 0x%lx size 0x%x\n", iova, size);
> +
> +       while (left) {

This needs to be (left > 0). The drivers are allowed to unmap more then
requested, so this value may turn negative.

> +               unmapped = domain->ops->unmap(domain, iova, left);
> +               if (!unmapped)
> +                       break;
> 
> -       BUG_ON(!IS_ALIGNED(iova, size));
> +               pr_debug("unmapped: iova 0x%lx size %u\n", iova, unmapped);
> 
> -       unmapped = domain->ops->unmap(domain, iova, size);
> +               iova += unmapped;
> +               left -= unmapped;
> +       }
> 
> -       return get_order(unmapped);
> +       return size - left;
>  }
>  EXPORT_SYMBOL_GPL(iommu_unmap);
> 
> diff --git a/drivers/iommu/omap-iovmm.c b/drivers/iommu/omap-iovmm.c
> index e8fdb88..0b7b14c 100644
> --- a/drivers/iommu/omap-iovmm.c
> +++ b/drivers/iommu/omap-iovmm.c
> @@ -409,7 +409,6 @@ static int map_iovm_area(struct iommu_domain
> *domain, struct iovm_struct *new,
>         unsigned int i, j;
>         struct scatterlist *sg;
>         u32 da = new->da_start;
> -       int order;
> 
>         if (!domain || !sgt)
>                 return -EINVAL;
> @@ -428,12 +427,10 @@ static int map_iovm_area(struct iommu_domain
> *domain, struct iovm_struct *new,
>                 if (bytes_to_iopgsz(bytes) < 0)
>                         goto err_out;
> 
> -               order = get_order(bytes);
> -
>                 pr_debug("%s: [%d] %08x %08x(%x)\n", __func__,
>                          i, da, pa, bytes);
> 
> -               err = iommu_map(domain, da, pa, order, flags);
> +               err = iommu_map(domain, da, pa, bytes, flags);
>                 if (err)
>                         goto err_out;
> 
> @@ -448,10 +445,9 @@ err_out:
>                 size_t bytes;
> 
>                 bytes = sg->length + sg->offset;
> -               order = get_order(bytes);
> 
>                 /* ignore failures.. we're already handling one */
> -               iommu_unmap(domain, da, order);
> +               iommu_unmap(domain, da, bytes);
> 
>                 da += bytes;
>         }
> @@ -466,7 +462,8 @@ static void unmap_iovm_area(struct iommu_domain
> *domain, struct omap_iommu *obj,
>         size_t total = area->da_end - area->da_start;
>         const struct sg_table *sgt = area->sgt;
>         struct scatterlist *sg;
> -       int i, err;
> +       int i;
> +       size_t unmapped;
> 
>         BUG_ON(!sgtable_ok(sgt));
>         BUG_ON((!total) || !IS_ALIGNED(total, PAGE_SIZE));
> @@ -474,13 +471,11 @@ static void unmap_iovm_area(struct iommu_domain
> *domain, struct omap_iommu *obj,
>         start = area->da_start;
>         for_each_sg(sgt->sgl, sg, sgt->nents, i) {
>                 size_t bytes;
> -               int order;
> 
>                 bytes = sg->length + sg->offset;
> -               order = get_order(bytes);
> 
> -               err = iommu_unmap(domain, start, order);
> -               if (err < 0)
> +               unmapped = iommu_unmap(domain, start, bytes);
> +               if (unmapped < bytes)
>                         break;
> 
>                 dev_dbg(obj->dev, "%s: unmap %08x(%x) %08x\n",
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 6b6ed21..76d7ce4 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -48,6 +48,22 @@ struct iommu_domain {
> 
>  #ifdef CONFIG_IOMMU_API
> 
> +/**
> + * struct iommu_ops - iommu ops and capabilities
> + * @domain_init: init iommu domain
> + * @domain_destroy: destroy iommu domain
> + * @attach_dev: attach device to an iommu domain
> + * @detach_dev: detach device from an iommu domain
> + * @map: map a physically contiguous memory region to an iommu domain
> + * @unmap: unmap a physically contiguous memory region from an iommu domain
> + * @iova_to_phys: translate iova to physical address
> + * @domain_has_cap: domain capabilities query
> + * @commit: commit iommu domain
> + * @pgsize_bitmap: bitmap of supported page sizes
> + * @min_pagesz: smallest page size supported. note: this member is private
> + *             to the IOMMU core, and maintained only for efficiency sake;
> + *             drivers don't need to set it.
> + */
>  struct iommu_ops {
>         int (*domain_init)(struct iommu_domain *domain);
>         void (*domain_destroy)(struct iommu_domain *domain);
> @@ -62,6 +78,8 @@ struct iommu_ops {
>         int (*domain_has_cap)(struct iommu_domain *domain,
>                               unsigned long cap);
>         void (*commit)(struct iommu_domain *domain);
> +       unsigned long pgsize_bitmap;
> +       unsigned int min_pagesz;
>  };
> 
>  extern int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops);
> @@ -73,9 +91,9 @@ extern int iommu_attach_device(struct iommu_domain *domain,
>  extern void iommu_detach_device(struct iommu_domain *domain,
>                                 struct device *dev);
>  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
> -                    phys_addr_t paddr, int gfp_order, int prot);
> -extern int iommu_unmap(struct iommu_domain *domain, unsigned long iova,
> -                      int gfp_order);
> +                    phys_addr_t paddr, size_t size, int prot);
> +extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
> +                      size_t size);
>  extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
>                                       unsigned long iova);
>  extern int iommu_domain_has_cap(struct iommu_domain *domain,
> diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
> index ca409be..35ed096 100644
> --- a/virt/kvm/iommu.c
> +++ b/virt/kvm/iommu.c
> @@ -111,7 +111,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct
> kvm_memory_slot *slot)
> 
>                 /* Map into IO address space */
>                 r = iommu_map(domain, gfn_to_gpa(gfn), pfn_to_hpa(pfn),
> -                             get_order(page_size), flags);
> +                             page_size, flags);
>                 if (r) {
>                         printk(KERN_ERR "kvm_iommu_map_address:"
>                                "iommu failed to map pfn=%llx\n", pfn);
> @@ -292,15 +292,15 @@ static void kvm_iommu_put_pages(struct kvm *kvm,
> 
>         while (gfn < end_gfn) {
>                 unsigned long unmap_pages;
> -               int order;
> +               size_t size;
> 
>                 /* Get physical address */
>                 phys = iommu_iova_to_phys(domain, gfn_to_gpa(gfn));
>                 pfn  = phys >> PAGE_SHIFT;
> 
>                 /* Unmap address from IO address space */
> -               order       = iommu_unmap(domain, gfn_to_gpa(gfn), 0);
> -               unmap_pages = 1ULL << order;
> +               size       = iommu_unmap(domain, gfn_to_gpa(gfn), PAGE_SIZE);
> +               unmap_pages = 1ULL << get_order(size);
> 
>                 /* Unpin all pages we just unmapped to not leak any memory */
>                 kvm_unpin_pages(kvm, pfn, unmap_pages);
> --
> 1.7.4.1

Btw, how have you tested this code?

Regards,

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
  2011-10-14 13:35                             ` Roedel, Joerg
  (?)
@ 2011-10-14 17:03                               ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-14 17:03 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: iommu, linux-omap, Laurent Pinchart, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Stepan Moskovchenko, kvm, Hiroshi Doyu

On Fri, Oct 14, 2011 at 3:35 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> This patch looks good. Please include it in the page-size patch-set.

will do.

> Hmm, I'd like to constify the iommu_ops structures in the future. This
> wouldn't work then anymore. How about renaming it to io_page_size
> (because it is the base page-size of the iommu) and let the driver set
> it?

Sure, that'd work. Though asking the driver to provide this feels a
bit redundant, as it already provided this info in the supported page
sizes.

I guess a cleaner solution might be allocating a 'struct iommu' in
which drivers will set a 'const struct iommu_ops *' member. That new
struct would then contain the base page-size and any other state the
iommu core would ever need.

This whole thing is quite marginal though and also very easy to change
later, so we can start with the "driver-provided io_page_size"
approach for now.

> This needs to be (left > 0). The drivers are allowed to unmap more then
> requested, so this value may turn negative.

Good point. 'left' is size_t though, so i'll fix this a bit differently.

> Btw, how have you tested this code?

Of course. Every patch I send is always tested on both OMAP3 (with
omap3isp) and OMAP4 (with rpmsg/remoteproc). In addition, everyone who
uses my wip rpmsg tree (there are several developers who do) is always
testing this without even knowing, as I'm always rebasing that tree on
the latest iommu patches (my wip branches are not linear).

Thanks,
Ohad.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-14 17:03                               ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-14 17:03 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: kvm, Arnd Bergmann, linux-kernel, iommu, Hiroshi Doyu,
	Laurent Pinchart, David Brown, linux-omap, David Woodhouse,
	linux-arm-kernel, Stepan Moskovchenko

On Fri, Oct 14, 2011 at 3:35 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> This patch looks good. Please include it in the page-size patch-set.

will do.

> Hmm, I'd like to constify the iommu_ops structures in the future. This
> wouldn't work then anymore. How about renaming it to io_page_size
> (because it is the base page-size of the iommu) and let the driver set
> it?

Sure, that'd work. Though asking the driver to provide this feels a
bit redundant, as it already provided this info in the supported page
sizes.

I guess a cleaner solution might be allocating a 'struct iommu' in
which drivers will set a 'const struct iommu_ops *' member. That new
struct would then contain the base page-size and any other state the
iommu core would ever need.

This whole thing is quite marginal though and also very easy to change
later, so we can start with the "driver-provided io_page_size"
approach for now.

> This needs to be (left > 0). The drivers are allowed to unmap more then
> requested, so this value may turn negative.

Good point. 'left' is size_t though, so i'll fix this a bit differently.

> Btw, how have you tested this code?

Of course. Every patch I send is always tested on both OMAP3 (with
omap3isp) and OMAP4 (with rpmsg/remoteproc). In addition, everyone who
uses my wip rpmsg tree (there are several developers who do) is always
testing this without even knowing, as I'm always rebasing that tree on
the latest iommu patches (my wip branches are not linear).

Thanks,
Ohad.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-14 17:03                               ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-14 17:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Oct 14, 2011 at 3:35 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> This patch looks good. Please include it in the page-size patch-set.

will do.

> Hmm, I'd like to constify the iommu_ops structures in the future. This
> wouldn't work then anymore. How about renaming it to io_page_size
> (because it is the base page-size of the iommu) and let the driver set
> it?

Sure, that'd work. Though asking the driver to provide this feels a
bit redundant, as it already provided this info in the supported page
sizes.

I guess a cleaner solution might be allocating a 'struct iommu' in
which drivers will set a 'const struct iommu_ops *' member. That new
struct would then contain the base page-size and any other state the
iommu core would ever need.

This whole thing is quite marginal though and also very easy to change
later, so we can start with the "driver-provided io_page_size"
approach for now.

> This needs to be (left > 0). The drivers are allowed to unmap more then
> requested, so this value may turn negative.

Good point. 'left' is size_t though, so i'll fix this a bit differently.

> Btw, how have you tested this code?

Of course. Every patch I send is always tested on both OMAP3 (with
omap3isp) and OMAP4 (with rpmsg/remoteproc). In addition, everyone who
uses my wip rpmsg tree (there are several developers who do) is always
testing this without even knowing, as I'm always rebasing that tree on
the latest iommu patches (my wip branches are not linear).

Thanks,
Ohad.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
  2011-10-14 17:03                               ` Ohad Ben-Cohen
  (?)
@ 2011-10-14 17:05                                 ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-14 17:05 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: iommu, linux-omap, Laurent Pinchart, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Stepan Moskovchenko, kvm, Hiroshi Doyu

On Fri, Oct 14, 2011 at 7:03 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
>> Btw, how have you tested this code?
>
> Of course. Every patch I send is always tested on...

*oops I thought you asked "Btw, have you tested this code?". I missed
the "how" part :)

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-14 17:05                                 ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-14 17:05 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: kvm, Arnd Bergmann, linux-kernel, iommu, Hiroshi Doyu,
	Laurent Pinchart, David Brown, linux-omap, David Woodhouse,
	linux-arm-kernel, Stepan Moskovchenko

On Fri, Oct 14, 2011 at 7:03 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
>> Btw, how have you tested this code?
>
> Of course. Every patch I send is always tested on...

*oops I thought you asked "Btw, have you tested this code?". I missed
the "how" part :)

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-14 17:05                                 ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-14 17:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Oct 14, 2011 at 7:03 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
>> Btw, how have you tested this code?
>
> Of course. Every patch I send is always tested on...

*oops I thought you asked "Btw, have you tested this code?". I missed
the "how" part :)

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
  2011-10-14 17:03                               ` Ohad Ben-Cohen
  (?)
@ 2011-10-17  8:05                                 ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-17  8:05 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: iommu, linux-omap, Laurent Pinchart, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Stepan Moskovchenko, kvm, Hiroshi Doyu

Hi Joerg,

On Fri, Oct 14, 2011 at 7:03 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> On Fri, Oct 14, 2011 at 3:35 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
>> Hmm, I'd like to constify the iommu_ops structures in the future. This
>> wouldn't work then anymore. How about renaming it to io_page_size
>> (because it is the base page-size of the iommu) and let the driver set
>> it?
>
> Sure, that'd work. Though asking the driver to provide this feels a
> bit redundant, as it already provided this info in the supported page
> sizes.
>
> I guess a cleaner solution might be allocating a 'struct iommu' in
> which drivers will set a 'const struct iommu_ops *' member. That new
> struct would then contain the base page-size and any other state the
> iommu core would ever need.
>
> This whole thing is quite marginal though and also very easy to change
> later, so we can start with the "driver-provided io_page_size"
> approach for now.

Sorry, I just couldn't get myself to sign-off this as it really feels
wrong to me.

This min_pagesz member is just cached by the core so it doesn't need to
look it up every time we're mapping. Drivers shouldn't care about it, as it's
completely internal to the iommu core. I'm afraid that pushing this to
the drivers
feels like redundant duplication of code and might also confuse developers.

Let me please suggest two alternatives:
a) drop this min_pagesz cache completely. iommu core would then redundantly
  re-calculate this every time something is mapped, but I hardly believe there
  is going to be a measurable impact on performance.
b) keep the current implementation for now, and fix this later (when we constify
  struct iommu_ops *) by caching min_pagesz in a dynamically allocated iommu
  context. Since this future "constify" patch will anyway need to change 'struct
  bus_type', it would be a good opportunity to do this change at the same time.

I don't mind which of those approaches to take, and I also don't mind doing (b)
myself later, in a separate patch. Your call.

>> This needs to be (left > 0). The drivers are allowed to unmap more then
>> requested, so this value may turn negative.
>
> Good point. 'left' is size_t though, so i'll fix this a bit differently.

Fixed, please take a look:

>From 00b8b9373fe2d73da0280ac1e6ade4a701c95140 Mon Sep 17 00:00:00 2001
From: Ohad Ben-Cohen <ohad@wizery.com>
Date: Mon, 10 Oct 2011 23:50:55 +0200
Subject: [PATCH] iommu/core: split mapping to page sizes as supported
by the hardware

When mapping a memory region, split it to page sizes as supported
by the iommu hardware. Always prefer bigger pages, when possible,
in order to reduce the TLB pressure.

The logic to do that is now added to the IOMMU core, so neither the iommu
drivers themselves nor users of the IOMMU API have to duplicate it.

This allows a more lenient granularity of mappings; traditionally the
IOMMU API took 'order' (of a page) as a mapping size, and directly let
the low level iommu drivers handle the mapping, but now that the IOMMU
core can split arbitrary memory regions into pages, we can remove this
limitation, so users don't have to split those regions by themselves.

Currently the supported page sizes are advertised once and they then
remain static. That works well for OMAP and MSM but it would probably
not fly well with intel's hardware, where the page size capabilities
seem to have the potential to be different between several DMA
remapping devices.

register_iommu() currently sets a default pgsize behavior, so we can convert
the IOMMU drivers in subsequent patches. After all the drivers
are converted, the temporary default settings will be removed.

Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted
to deal with bytes instead of page order.

Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review!

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: David Brown <davidb@codeaurora.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <Joerg.Roedel@amd.com>
Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
Cc: KyongHo Cho <pullip.cho@samsung.com>
Cc: Hiroshi DOYU <hdoyu@nvidia.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: kvm@vger.kernel.org
---
 drivers/iommu/iommu.c      |  124 +++++++++++++++++++++++++++++++++++++++-----
 drivers/iommu/omap-iovmm.c |   17 ++----
 include/linux/iommu.h      |   24 +++++++-
 virt/kvm/iommu.c           |    8 ++--
 4 files changed, 141 insertions(+), 32 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 5d042e8..26309fc 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -16,6 +16,8 @@
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
  */

+#define pr_fmt(fmt)    "%s: " fmt, __func__
+
 #include <linux/device.h>
 #include <linux/kernel.h>
 #include <linux/bug.h>
@@ -47,6 +49,19 @@ int bus_set_iommu(struct bus_type *bus, struct
iommu_ops *ops)
 	if (bus->iommu_ops != NULL)
 		return -EBUSY;

+	/*
+	 * Set the default pgsize values, which retain the existing
+	 * IOMMU API behavior: drivers will be called to map
+	 * regions that are sized/aligned to order of 4KiB pages.
+	 *
+	 * This will be removed once all drivers are migrated.
+	 */
+	if (!ops->pgsize_bitmap)
+		ops->pgsize_bitmap = ~0xFFFUL;
+
+	/* find out the minimum page size only once */
+	ops->min_pagesz = 1 << __ffs(ops->pgsize_bitmap);
+
 	bus->iommu_ops = ops;

 	/* Do IOMMU specific setup for this bus-type */
@@ -157,35 +172,116 @@ int iommu_domain_has_cap(struct iommu_domain *domain,
 EXPORT_SYMBOL_GPL(iommu_domain_has_cap);

 int iommu_map(struct iommu_domain *domain, unsigned long iova,
-	      phys_addr_t paddr, int gfp_order, int prot)
+	      phys_addr_t paddr, size_t size, int prot)
 {
-	size_t size;
+	unsigned long orig_iova = iova;
+	size_t orig_size = size;
+	int ret = 0;

 	if (unlikely(domain->ops->map == NULL))
 		return -ENODEV;

-	size         = PAGE_SIZE << gfp_order;
+	/*
+	 * both the virtual address and the physical one, as well as
+	 * the size of the mapping, must be aligned (at least) to the
+	 * size of the smallest page supported by the hardware
+	 */
+	if (!IS_ALIGNED(iova | paddr | size, domain->ops->min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%x min_pagesz "
+			"0x%x\n", iova, (unsigned long)paddr,
+			size, domain->ops->min_pagesz);
+		return -EINVAL;
+	}
+
+	pr_debug("map: iova 0x%lx pa 0x%lx size 0x%x\n", iova,
+				(unsigned long)paddr, size);
+
+	while (size) {
+		unsigned long pgsize, addr_merge = iova | paddr;
+		unsigned int pgsize_idx;
+
+		/* Max page size that still fits into 'size' */
+		pgsize_idx = __fls(size);
+
+		/* need to consider alignment requirements ? */
+		if (likely(addr_merge)) {
+			/* Max page size allowed by both iova and paddr */
+			unsigned int align_pgsize_idx = __ffs(addr_merge);
+
+			pgsize_idx = min(pgsize_idx, align_pgsize_idx);
+		}
+
+		/* build a mask of acceptable page sizes */
+		pgsize = (1UL << (pgsize_idx + 1)) - 1;
+
+		/* throw away page sizes not supported by the hardware */
+		pgsize &= domain->ops->pgsize_bitmap;

-	BUG_ON(!IS_ALIGNED(iova | paddr, size));
+		/* make sure we're still sane */
+		BUG_ON(!pgsize);

-	return domain->ops->map(domain, iova, paddr, size, prot);
+		/* pick the biggest page */
+		pgsize_idx = __fls(pgsize);
+		pgsize = 1UL << pgsize_idx;
+
+		pr_debug("mapping: iova 0x%lx pa 0x%lx pgsize %lu\n", iova,
+					(unsigned long)paddr, pgsize);
+
+		ret = domain->ops->map(domain, iova, paddr, pgsize, prot);
+		if (ret)
+			break;
+
+		iova += pgsize;
+		paddr += pgsize;
+		size -= pgsize;
+	}
+
+	/* unroll mapping in case something went wrong */
+	if (ret)
+		iommu_unmap(domain, orig_iova, orig_size - size);
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_map);

-int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
+size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
size_t size)
 {
-	size_t size, unmapped;
+	size_t unmapped_page, unmapped = 0;

 	if (unlikely(domain->ops->unmap == NULL))
 		return -ENODEV;

-	size         = PAGE_SIZE << gfp_order;
-
-	BUG_ON(!IS_ALIGNED(iova, size));
-
-	unmapped = domain->ops->unmap(domain, iova, size);
-
-	return get_order(unmapped);
+	/*
+	 * The virtual address, as well as the size of the mapping, must be
+	 * aligned (at least) to the size of the smallest page supported
+	 * by the hardware
+	 */
+	if (!IS_ALIGNED(iova | size, domain->ops->min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx size 0x%x min_pagesz 0x%x\n",
+			iova, size, domain->ops->min_pagesz);
+		return -EINVAL;
+	}
+
+	pr_debug("unmap this: iova 0x%lx size 0x%x\n", iova, size);
+
+	/*
+	 * Keep iterating until we either unmap 'size' bytes (or more)
+	 * or we hit an area that isn't mapped.
+	 */
+	while (unmapped < size) {
+		size_t left = size - unmapped;
+
+		unmapped_page = domain->ops->unmap(domain, iova, left);
+		if (!unmapped_page)
+			break;
+
+		pr_debug("unmapped: iova 0x%lx size %u\n", iova, unmapped_page);
+
+		iova += unmapped_page;
+		unmapped += unmapped_page;
+	}
+
+	return unmapped;
 }
 EXPORT_SYMBOL_GPL(iommu_unmap);

diff --git a/drivers/iommu/omap-iovmm.c b/drivers/iommu/omap-iovmm.c
index e8fdb88..0b7b14c 100644
--- a/drivers/iommu/omap-iovmm.c
+++ b/drivers/iommu/omap-iovmm.c
@@ -409,7 +409,6 @@ static int map_iovm_area(struct iommu_domain
*domain, struct iovm_struct *new,
 	unsigned int i, j;
 	struct scatterlist *sg;
 	u32 da = new->da_start;
-	int order;

 	if (!domain || !sgt)
 		return -EINVAL;
@@ -428,12 +427,10 @@ static int map_iovm_area(struct iommu_domain
*domain, struct iovm_struct *new,
 		if (bytes_to_iopgsz(bytes) < 0)
 			goto err_out;

-		order = get_order(bytes);
-
 		pr_debug("%s: [%d] %08x %08x(%x)\n", __func__,
 			 i, da, pa, bytes);

-		err = iommu_map(domain, da, pa, order, flags);
+		err = iommu_map(domain, da, pa, bytes, flags);
 		if (err)
 			goto err_out;

@@ -448,10 +445,9 @@ err_out:
 		size_t bytes;

 		bytes = sg->length + sg->offset;
-		order = get_order(bytes);

 		/* ignore failures.. we're already handling one */
-		iommu_unmap(domain, da, order);
+		iommu_unmap(domain, da, bytes);

 		da += bytes;
 	}
@@ -466,7 +462,8 @@ static void unmap_iovm_area(struct iommu_domain
*domain, struct omap_iommu *obj,
 	size_t total = area->da_end - area->da_start;
 	const struct sg_table *sgt = area->sgt;
 	struct scatterlist *sg;
-	int i, err;
+	int i;
+	size_t unmapped;

 	BUG_ON(!sgtable_ok(sgt));
 	BUG_ON((!total) || !IS_ALIGNED(total, PAGE_SIZE));
@@ -474,13 +471,11 @@ static void unmap_iovm_area(struct iommu_domain
*domain, struct omap_iommu *obj,
 	start = area->da_start;
 	for_each_sg(sgt->sgl, sg, sgt->nents, i) {
 		size_t bytes;
-		int order;

 		bytes = sg->length + sg->offset;
-		order = get_order(bytes);

-		err = iommu_unmap(domain, start, order);
-		if (err < 0)
+		unmapped = iommu_unmap(domain, start, bytes);
+		if (unmapped < bytes)
 			break;

 		dev_dbg(obj->dev, "%s: unmap %08x(%x) %08x\n",
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 6b6ed21..76d7ce4 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -48,6 +48,22 @@ struct iommu_domain {

 #ifdef CONFIG_IOMMU_API

+/**
+ * struct iommu_ops - iommu ops and capabilities
+ * @domain_init: init iommu domain
+ * @domain_destroy: destroy iommu domain
+ * @attach_dev: attach device to an iommu domain
+ * @detach_dev: detach device from an iommu domain
+ * @map: map a physically contiguous memory region to an iommu domain
+ * @unmap: unmap a physically contiguous memory region from an iommu domain
+ * @iova_to_phys: translate iova to physical address
+ * @domain_has_cap: domain capabilities query
+ * @commit: commit iommu domain
+ * @pgsize_bitmap: bitmap of supported page sizes
+ * @min_pagesz: smallest page size supported. note: this member is private
+ *		to the IOMMU core, and maintained only for efficiency sake;
+ *		drivers don't need to set it.
+ */
 struct iommu_ops {
 	int (*domain_init)(struct iommu_domain *domain);
 	void (*domain_destroy)(struct iommu_domain *domain);
@@ -62,6 +78,8 @@ struct iommu_ops {
 	int (*domain_has_cap)(struct iommu_domain *domain,
 			      unsigned long cap);
 	void (*commit)(struct iommu_domain *domain);
+	unsigned long pgsize_bitmap;
+	unsigned int min_pagesz;
 };

 extern int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops);
@@ -73,9 +91,9 @@ extern int iommu_attach_device(struct iommu_domain *domain,
 extern void iommu_detach_device(struct iommu_domain *domain,
 				struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
-		     phys_addr_t paddr, int gfp_order, int prot);
-extern int iommu_unmap(struct iommu_domain *domain, unsigned long iova,
-		       int gfp_order);
+		     phys_addr_t paddr, size_t size, int prot);
+extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
+		       size_t size);
 extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
 				      unsigned long iova);
 extern int iommu_domain_has_cap(struct iommu_domain *domain,
diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index ca409be..35ed096 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -111,7 +111,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct
kvm_memory_slot *slot)

 		/* Map into IO address space */
 		r = iommu_map(domain, gfn_to_gpa(gfn), pfn_to_hpa(pfn),
-			      get_order(page_size), flags);
+			      page_size, flags);
 		if (r) {
 			printk(KERN_ERR "kvm_iommu_map_address:"
 			       "iommu failed to map pfn=%llx\n", pfn);
@@ -292,15 +292,15 @@ static void kvm_iommu_put_pages(struct kvm *kvm,

 	while (gfn < end_gfn) {
 		unsigned long unmap_pages;
-		int order;
+		size_t size;

 		/* Get physical address */
 		phys = iommu_iova_to_phys(domain, gfn_to_gpa(gfn));
 		pfn  = phys >> PAGE_SHIFT;

 		/* Unmap address from IO address space */
-		order       = iommu_unmap(domain, gfn_to_gpa(gfn), 0);
-		unmap_pages = 1ULL << order;
+		size       = iommu_unmap(domain, gfn_to_gpa(gfn), PAGE_SIZE);
+		unmap_pages = 1ULL << get_order(size);

 		/* Unpin all pages we just unmapped to not leak any memory */
 		kvm_unpin_pages(kvm, pfn, unmap_pages);
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-17  8:05                                 ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-17  8:05 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: iommu, linux-omap, Laurent Pinchart, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Stepan Moskovchenko, kvm, Hiroshi Doyu

Hi Joerg,

On Fri, Oct 14, 2011 at 7:03 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> On Fri, Oct 14, 2011 at 3:35 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
>> Hmm, I'd like to constify the iommu_ops structures in the future. This
>> wouldn't work then anymore. How about renaming it to io_page_size
>> (because it is the base page-size of the iommu) and let the driver set
>> it?
>
> Sure, that'd work. Though asking the driver to provide this feels a
> bit redundant, as it already provided this info in the supported page
> sizes.
>
> I guess a cleaner solution might be allocating a 'struct iommu' in
> which drivers will set a 'const struct iommu_ops *' member. That new
> struct would then contain the base page-size and any other state the
> iommu core would ever need.
>
> This whole thing is quite marginal though and also very easy to change
> later, so we can start with the "driver-provided io_page_size"
> approach for now.

Sorry, I just couldn't get myself to sign-off this as it really feels
wrong to me.

This min_pagesz member is just cached by the core so it doesn't need to
look it up every time we're mapping. Drivers shouldn't care about it, as it's
completely internal to the iommu core. I'm afraid that pushing this to
the drivers
feels like redundant duplication of code and might also confuse developers.

Let me please suggest two alternatives:
a) drop this min_pagesz cache completely. iommu core would then redundantly
  re-calculate this every time something is mapped, but I hardly believe there
  is going to be a measurable impact on performance.
b) keep the current implementation for now, and fix this later (when we constify
  struct iommu_ops *) by caching min_pagesz in a dynamically allocated iommu
  context. Since this future "constify" patch will anyway need to change 'struct
  bus_type', it would be a good opportunity to do this change at the same time.

I don't mind which of those approaches to take, and I also don't mind doing (b)
myself later, in a separate patch. Your call.

>> This needs to be (left > 0). The drivers are allowed to unmap more then
>> requested, so this value may turn negative.
>
> Good point. 'left' is size_t though, so i'll fix this a bit differently.

Fixed, please take a look:

>From 00b8b9373fe2d73da0280ac1e6ade4a701c95140 Mon Sep 17 00:00:00 2001
From: Ohad Ben-Cohen <ohad@wizery.com>
Date: Mon, 10 Oct 2011 23:50:55 +0200
Subject: [PATCH] iommu/core: split mapping to page sizes as supported
by the hardware

When mapping a memory region, split it to page sizes as supported
by the iommu hardware. Always prefer bigger pages, when possible,
in order to reduce the TLB pressure.

The logic to do that is now added to the IOMMU core, so neither the iommu
drivers themselves nor users of the IOMMU API have to duplicate it.

This allows a more lenient granularity of mappings; traditionally the
IOMMU API took 'order' (of a page) as a mapping size, and directly let
the low level iommu drivers handle the mapping, but now that the IOMMU
core can split arbitrary memory regions into pages, we can remove this
limitation, so users don't have to split those regions by themselves.

Currently the supported page sizes are advertised once and they then
remain static. That works well for OMAP and MSM but it would probably
not fly well with intel's hardware, where the page size capabilities
seem to have the potential to be different between several DMA
remapping devices.

register_iommu() currently sets a default pgsize behavior, so we can convert
the IOMMU drivers in subsequent patches. After all the drivers
are converted, the temporary default settings will be removed.

Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted
to deal with bytes instead of page order.

Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review!

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: David Brown <davidb@codeaurora.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <Joerg.Roedel@amd.com>
Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
Cc: KyongHo Cho <pullip.cho@samsung.com>
Cc: Hiroshi DOYU <hdoyu@nvidia.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: kvm@vger.kernel.org
---
 drivers/iommu/iommu.c      |  124 +++++++++++++++++++++++++++++++++++++++-----
 drivers/iommu/omap-iovmm.c |   17 ++----
 include/linux/iommu.h      |   24 +++++++-
 virt/kvm/iommu.c           |    8 ++--
 4 files changed, 141 insertions(+), 32 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 5d042e8..26309fc 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -16,6 +16,8 @@
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
  */

+#define pr_fmt(fmt)    "%s: " fmt, __func__
+
 #include <linux/device.h>
 #include <linux/kernel.h>
 #include <linux/bug.h>
@@ -47,6 +49,19 @@ int bus_set_iommu(struct bus_type *bus, struct
iommu_ops *ops)
 	if (bus->iommu_ops != NULL)
 		return -EBUSY;

+	/*
+	 * Set the default pgsize values, which retain the existing
+	 * IOMMU API behavior: drivers will be called to map
+	 * regions that are sized/aligned to order of 4KiB pages.
+	 *
+	 * This will be removed once all drivers are migrated.
+	 */
+	if (!ops->pgsize_bitmap)
+		ops->pgsize_bitmap = ~0xFFFUL;
+
+	/* find out the minimum page size only once */
+	ops->min_pagesz = 1 << __ffs(ops->pgsize_bitmap);
+
 	bus->iommu_ops = ops;

 	/* Do IOMMU specific setup for this bus-type */
@@ -157,35 +172,116 @@ int iommu_domain_has_cap(struct iommu_domain *domain,
 EXPORT_SYMBOL_GPL(iommu_domain_has_cap);

 int iommu_map(struct iommu_domain *domain, unsigned long iova,
-	      phys_addr_t paddr, int gfp_order, int prot)
+	      phys_addr_t paddr, size_t size, int prot)
 {
-	size_t size;
+	unsigned long orig_iova = iova;
+	size_t orig_size = size;
+	int ret = 0;

 	if (unlikely(domain->ops->map == NULL))
 		return -ENODEV;

-	size         = PAGE_SIZE << gfp_order;
+	/*
+	 * both the virtual address and the physical one, as well as
+	 * the size of the mapping, must be aligned (at least) to the
+	 * size of the smallest page supported by the hardware
+	 */
+	if (!IS_ALIGNED(iova | paddr | size, domain->ops->min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%x min_pagesz "
+			"0x%x\n", iova, (unsigned long)paddr,
+			size, domain->ops->min_pagesz);
+		return -EINVAL;
+	}
+
+	pr_debug("map: iova 0x%lx pa 0x%lx size 0x%x\n", iova,
+				(unsigned long)paddr, size);
+
+	while (size) {
+		unsigned long pgsize, addr_merge = iova | paddr;
+		unsigned int pgsize_idx;
+
+		/* Max page size that still fits into 'size' */
+		pgsize_idx = __fls(size);
+
+		/* need to consider alignment requirements ? */
+		if (likely(addr_merge)) {
+			/* Max page size allowed by both iova and paddr */
+			unsigned int align_pgsize_idx = __ffs(addr_merge);
+
+			pgsize_idx = min(pgsize_idx, align_pgsize_idx);
+		}
+
+		/* build a mask of acceptable page sizes */
+		pgsize = (1UL << (pgsize_idx + 1)) - 1;
+
+		/* throw away page sizes not supported by the hardware */
+		pgsize &= domain->ops->pgsize_bitmap;

-	BUG_ON(!IS_ALIGNED(iova | paddr, size));
+		/* make sure we're still sane */
+		BUG_ON(!pgsize);

-	return domain->ops->map(domain, iova, paddr, size, prot);
+		/* pick the biggest page */
+		pgsize_idx = __fls(pgsize);
+		pgsize = 1UL << pgsize_idx;
+
+		pr_debug("mapping: iova 0x%lx pa 0x%lx pgsize %lu\n", iova,
+					(unsigned long)paddr, pgsize);
+
+		ret = domain->ops->map(domain, iova, paddr, pgsize, prot);
+		if (ret)
+			break;
+
+		iova += pgsize;
+		paddr += pgsize;
+		size -= pgsize;
+	}
+
+	/* unroll mapping in case something went wrong */
+	if (ret)
+		iommu_unmap(domain, orig_iova, orig_size - size);
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_map);

-int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
+size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
size_t size)
 {
-	size_t size, unmapped;
+	size_t unmapped_page, unmapped = 0;

 	if (unlikely(domain->ops->unmap == NULL))
 		return -ENODEV;

-	size         = PAGE_SIZE << gfp_order;
-
-	BUG_ON(!IS_ALIGNED(iova, size));
-
-	unmapped = domain->ops->unmap(domain, iova, size);
-
-	return get_order(unmapped);
+	/*
+	 * The virtual address, as well as the size of the mapping, must be
+	 * aligned (at least) to the size of the smallest page supported
+	 * by the hardware
+	 */
+	if (!IS_ALIGNED(iova | size, domain->ops->min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx size 0x%x min_pagesz 0x%x\n",
+			iova, size, domain->ops->min_pagesz);
+		return -EINVAL;
+	}
+
+	pr_debug("unmap this: iova 0x%lx size 0x%x\n", iova, size);
+
+	/*
+	 * Keep iterating until we either unmap 'size' bytes (or more)
+	 * or we hit an area that isn't mapped.
+	 */
+	while (unmapped < size) {
+		size_t left = size - unmapped;
+
+		unmapped_page = domain->ops->unmap(domain, iova, left);
+		if (!unmapped_page)
+			break;
+
+		pr_debug("unmapped: iova 0x%lx size %u\n", iova, unmapped_page);
+
+		iova += unmapped_page;
+		unmapped += unmapped_page;
+	}
+
+	return unmapped;
 }
 EXPORT_SYMBOL_GPL(iommu_unmap);

diff --git a/drivers/iommu/omap-iovmm.c b/drivers/iommu/omap-iovmm.c
index e8fdb88..0b7b14c 100644
--- a/drivers/iommu/omap-iovmm.c
+++ b/drivers/iommu/omap-iovmm.c
@@ -409,7 +409,6 @@ static int map_iovm_area(struct iommu_domain
*domain, struct iovm_struct *new,
 	unsigned int i, j;
 	struct scatterlist *sg;
 	u32 da = new->da_start;
-	int order;

 	if (!domain || !sgt)
 		return -EINVAL;
@@ -428,12 +427,10 @@ static int map_iovm_area(struct iommu_domain
*domain, struct iovm_struct *new,
 		if (bytes_to_iopgsz(bytes) < 0)
 			goto err_out;

-		order = get_order(bytes);
-
 		pr_debug("%s: [%d] %08x %08x(%x)\n", __func__,
 			 i, da, pa, bytes);

-		err = iommu_map(domain, da, pa, order, flags);
+		err = iommu_map(domain, da, pa, bytes, flags);
 		if (err)
 			goto err_out;

@@ -448,10 +445,9 @@ err_out:
 		size_t bytes;

 		bytes = sg->length + sg->offset;
-		order = get_order(bytes);

 		/* ignore failures.. we're already handling one */
-		iommu_unmap(domain, da, order);
+		iommu_unmap(domain, da, bytes);

 		da += bytes;
 	}
@@ -466,7 +462,8 @@ static void unmap_iovm_area(struct iommu_domain
*domain, struct omap_iommu *obj,
 	size_t total = area->da_end - area->da_start;
 	const struct sg_table *sgt = area->sgt;
 	struct scatterlist *sg;
-	int i, err;
+	int i;
+	size_t unmapped;

 	BUG_ON(!sgtable_ok(sgt));
 	BUG_ON((!total) || !IS_ALIGNED(total, PAGE_SIZE));
@@ -474,13 +471,11 @@ static void unmap_iovm_area(struct iommu_domain
*domain, struct omap_iommu *obj,
 	start = area->da_start;
 	for_each_sg(sgt->sgl, sg, sgt->nents, i) {
 		size_t bytes;
-		int order;

 		bytes = sg->length + sg->offset;
-		order = get_order(bytes);

-		err = iommu_unmap(domain, start, order);
-		if (err < 0)
+		unmapped = iommu_unmap(domain, start, bytes);
+		if (unmapped < bytes)
 			break;

 		dev_dbg(obj->dev, "%s: unmap %08x(%x) %08x\n",
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 6b6ed21..76d7ce4 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -48,6 +48,22 @@ struct iommu_domain {

 #ifdef CONFIG_IOMMU_API

+/**
+ * struct iommu_ops - iommu ops and capabilities
+ * @domain_init: init iommu domain
+ * @domain_destroy: destroy iommu domain
+ * @attach_dev: attach device to an iommu domain
+ * @detach_dev: detach device from an iommu domain
+ * @map: map a physically contiguous memory region to an iommu domain
+ * @unmap: unmap a physically contiguous memory region from an iommu domain
+ * @iova_to_phys: translate iova to physical address
+ * @domain_has_cap: domain capabilities query
+ * @commit: commit iommu domain
+ * @pgsize_bitmap: bitmap of supported page sizes
+ * @min_pagesz: smallest page size supported. note: this member is private
+ *		to the IOMMU core, and maintained only for efficiency sake;
+ *		drivers don't need to set it.
+ */
 struct iommu_ops {
 	int (*domain_init)(struct iommu_domain *domain);
 	void (*domain_destroy)(struct iommu_domain *domain);
@@ -62,6 +78,8 @@ struct iommu_ops {
 	int (*domain_has_cap)(struct iommu_domain *domain,
 			      unsigned long cap);
 	void (*commit)(struct iommu_domain *domain);
+	unsigned long pgsize_bitmap;
+	unsigned int min_pagesz;
 };

 extern int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops);
@@ -73,9 +91,9 @@ extern int iommu_attach_device(struct iommu_domain *domain,
 extern void iommu_detach_device(struct iommu_domain *domain,
 				struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
-		     phys_addr_t paddr, int gfp_order, int prot);
-extern int iommu_unmap(struct iommu_domain *domain, unsigned long iova,
-		       int gfp_order);
+		     phys_addr_t paddr, size_t size, int prot);
+extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
+		       size_t size);
 extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
 				      unsigned long iova);
 extern int iommu_domain_has_cap(struct iommu_domain *domain,
diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index ca409be..35ed096 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -111,7 +111,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct
kvm_memory_slot *slot)

 		/* Map into IO address space */
 		r = iommu_map(domain, gfn_to_gpa(gfn), pfn_to_hpa(pfn),
-			      get_order(page_size), flags);
+			      page_size, flags);
 		if (r) {
 			printk(KERN_ERR "kvm_iommu_map_address:"
 			       "iommu failed to map pfn=%llx\n", pfn);
@@ -292,15 +292,15 @@ static void kvm_iommu_put_pages(struct kvm *kvm,

 	while (gfn < end_gfn) {
 		unsigned long unmap_pages;
-		int order;
+		size_t size;

 		/* Get physical address */
 		phys = iommu_iova_to_phys(domain, gfn_to_gpa(gfn));
 		pfn  = phys >> PAGE_SHIFT;

 		/* Unmap address from IO address space */
-		order       = iommu_unmap(domain, gfn_to_gpa(gfn), 0);
-		unmap_pages = 1ULL << order;
+		size       = iommu_unmap(domain, gfn_to_gpa(gfn), PAGE_SIZE);
+		unmap_pages = 1ULL << get_order(size);

 		/* Unpin all pages we just unmapped to not leak any memory */
 		kvm_unpin_pages(kvm, pfn, unmap_pages);
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-17  8:05                                 ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-17  8:05 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Joerg,

On Fri, Oct 14, 2011 at 7:03 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> On Fri, Oct 14, 2011 at 3:35 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
>> Hmm, I'd like to constify the iommu_ops structures in the future. This
>> wouldn't work then anymore. How about renaming it to io_page_size
>> (because it is the base page-size of the iommu) and let the driver set
>> it?
>
> Sure, that'd work. Though asking the driver to provide this feels a
> bit redundant, as it already provided this info in the supported page
> sizes.
>
> I guess a cleaner solution might be allocating a 'struct iommu' in
> which drivers will set a 'const struct iommu_ops *' member. That new
> struct would then contain the base page-size and any other state the
> iommu core would ever need.
>
> This whole thing is quite marginal though and also very easy to change
> later, so we can start with the "driver-provided io_page_size"
> approach for now.

Sorry, I just couldn't get myself to sign-off this as it really feels
wrong to me.

This min_pagesz member is just cached by the core so it doesn't need to
look it up every time we're mapping. Drivers shouldn't care about it, as it's
completely internal to the iommu core. I'm afraid that pushing this to
the drivers
feels like redundant duplication of code and might also confuse developers.

Let me please suggest two alternatives:
a) drop this min_pagesz cache completely. iommu core would then redundantly
  re-calculate this every time something is mapped, but I hardly believe there
  is going to be a measurable impact on performance.
b) keep the current implementation for now, and fix this later (when we constify
  struct iommu_ops *) by caching min_pagesz in a dynamically allocated iommu
  context. Since this future "constify" patch will anyway need to change 'struct
  bus_type', it would be a good opportunity to do this change at the same time.

I don't mind which of those approaches to take, and I also don't mind doing (b)
myself later, in a separate patch. Your call.

>> This needs to be (left > 0). The drivers are allowed to unmap more then
>> requested, so this value may turn negative.
>
> Good point. 'left' is size_t though, so i'll fix this a bit differently.

Fixed, please take a look:

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
  2011-10-17  8:05                                 ` Ohad Ben-Cohen
  (?)
@ 2011-10-17  9:28                                   ` Roedel, Joerg
  -1 siblings, 0 replies; 106+ messages in thread
From: Roedel, Joerg @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: iommu, linux-omap, Laurent Pinchart, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Stepan Moskovchenko, kvm, Hiroshi Doyu

Hi Ohad,

On Mon, Oct 17, 2011 at 04:05:02AM -0400, Ohad Ben-Cohen wrote:

> > This whole thing is quite marginal though and also very easy to change
> > later, so we can start with the "driver-provided io_page_size"
> > approach for now.
> 
> Sorry, I just couldn't get myself to sign-off this as it really feels
> wrong to me.
> 
> This min_pagesz member is just cached by the core so it doesn't need to
> look it up every time we're mapping. Drivers shouldn't care about it, as it's
> completely internal to the iommu core. I'm afraid that pushing this to
> the drivers
> feels like redundant duplication of code and might also confuse developers.
> 
> Let me please suggest two alternatives:
> a) drop this min_pagesz cache completely. iommu core would then redundantly
>   re-calculate this every time something is mapped, but I hardly believe there
>   is going to be a measurable impact on performance.
> b) keep the current implementation for now, and fix this later (when we constify
>   struct iommu_ops *) by caching min_pagesz in a dynamically allocated iommu
>   context. Since this future "constify" patch will anyway need to change 'struct
>   bus_type', it would be a good opportunity to do this change at the same time.
> 
> I don't mind which of those approaches to take, and I also don't mind doing (b)
> myself later, in a separate patch. Your call.

I think option a) is the best. It should add only minimal overhead to
the iommu_map path.

> 
> >> This needs to be (left > 0). The drivers are allowed to unmap more then
> >> requested, so this value may turn negative.
> >
> > Good point. 'left' is size_t though, so i'll fix this a bit differently.
> 
> Fixed, please take a look:
> 
> >From 00b8b9373fe2d73da0280ac1e6ade4a701c95140 Mon Sep 17 00:00:00 2001
> From: Ohad Ben-Cohen <ohad@wizery.com>
> Date: Mon, 10 Oct 2011 23:50:55 +0200
> Subject: [PATCH] iommu/core: split mapping to page sizes as supported
> by the hardware
> 
> When mapping a memory region, split it to page sizes as supported
> by the iommu hardware. Always prefer bigger pages, when possible,
> in order to reduce the TLB pressure.
> 
> The logic to do that is now added to the IOMMU core, so neither the iommu
> drivers themselves nor users of the IOMMU API have to duplicate it.
> 
> This allows a more lenient granularity of mappings; traditionally the
> IOMMU API took 'order' (of a page) as a mapping size, and directly let
> the low level iommu drivers handle the mapping, but now that the IOMMU
> core can split arbitrary memory regions into pages, we can remove this
> limitation, so users don't have to split those regions by themselves.
> 
> Currently the supported page sizes are advertised once and they then
> remain static. That works well for OMAP and MSM but it would probably
> not fly well with intel's hardware, where the page size capabilities
> seem to have the potential to be different between several DMA
> remapping devices.
> 
> register_iommu() currently sets a default pgsize behavior, so we can convert
> the IOMMU drivers in subsequent patches. After all the drivers
> are converted, the temporary default settings will be removed.
> 
> Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted
> to deal with bytes instead of page order.
> 
> Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review!
> 
> Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
> Cc: David Brown <davidb@codeaurora.org>
> Cc: David Woodhouse <dwmw2@infradead.org>
> Cc: Joerg Roedel <Joerg.Roedel@amd.com>
> Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
> Cc: KyongHo Cho <pullip.cho@samsung.com>
> Cc: Hiroshi DOYU <hdoyu@nvidia.com>
> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
> Cc: kvm@vger.kernel.org
> ---
>  drivers/iommu/iommu.c      |  124 +++++++++++++++++++++++++++++++++++++++-----
>  drivers/iommu/omap-iovmm.c |   17 ++----
>  include/linux/iommu.h      |   24 +++++++-
>  virt/kvm/iommu.c           |    8 ++--
>  4 files changed, 141 insertions(+), 32 deletions(-)

The patch looks good now. Please implement option a) and it should be
fine. I will test it on an AMD IOMMU platform. We still need someone to
test it on VT-d.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-17  9:28                                   ` Roedel, Joerg
  0 siblings, 0 replies; 106+ messages in thread
From: Roedel, Joerg @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: kvm, Arnd Bergmann, linux-kernel, iommu, Hiroshi Doyu,
	Laurent Pinchart, David Brown, linux-omap, David Woodhouse,
	linux-arm-kernel, Stepan Moskovchenko

Hi Ohad,

On Mon, Oct 17, 2011 at 04:05:02AM -0400, Ohad Ben-Cohen wrote:

> > This whole thing is quite marginal though and also very easy to change
> > later, so we can start with the "driver-provided io_page_size"
> > approach for now.
> 
> Sorry, I just couldn't get myself to sign-off this as it really feels
> wrong to me.
> 
> This min_pagesz member is just cached by the core so it doesn't need to
> look it up every time we're mapping. Drivers shouldn't care about it, as it's
> completely internal to the iommu core. I'm afraid that pushing this to
> the drivers
> feels like redundant duplication of code and might also confuse developers.
> 
> Let me please suggest two alternatives:
> a) drop this min_pagesz cache completely. iommu core would then redundantly
>   re-calculate this every time something is mapped, but I hardly believe there
>   is going to be a measurable impact on performance.
> b) keep the current implementation for now, and fix this later (when we constify
>   struct iommu_ops *) by caching min_pagesz in a dynamically allocated iommu
>   context. Since this future "constify" patch will anyway need to change 'struct
>   bus_type', it would be a good opportunity to do this change at the same time.
> 
> I don't mind which of those approaches to take, and I also don't mind doing (b)
> myself later, in a separate patch. Your call.

I think option a) is the best. It should add only minimal overhead to
the iommu_map path.

> 
> >> This needs to be (left > 0). The drivers are allowed to unmap more then
> >> requested, so this value may turn negative.
> >
> > Good point. 'left' is size_t though, so i'll fix this a bit differently.
> 
> Fixed, please take a look:
> 
> >From 00b8b9373fe2d73da0280ac1e6ade4a701c95140 Mon Sep 17 00:00:00 2001
> From: Ohad Ben-Cohen <ohad@wizery.com>
> Date: Mon, 10 Oct 2011 23:50:55 +0200
> Subject: [PATCH] iommu/core: split mapping to page sizes as supported
> by the hardware
> 
> When mapping a memory region, split it to page sizes as supported
> by the iommu hardware. Always prefer bigger pages, when possible,
> in order to reduce the TLB pressure.
> 
> The logic to do that is now added to the IOMMU core, so neither the iommu
> drivers themselves nor users of the IOMMU API have to duplicate it.
> 
> This allows a more lenient granularity of mappings; traditionally the
> IOMMU API took 'order' (of a page) as a mapping size, and directly let
> the low level iommu drivers handle the mapping, but now that the IOMMU
> core can split arbitrary memory regions into pages, we can remove this
> limitation, so users don't have to split those regions by themselves.
> 
> Currently the supported page sizes are advertised once and they then
> remain static. That works well for OMAP and MSM but it would probably
> not fly well with intel's hardware, where the page size capabilities
> seem to have the potential to be different between several DMA
> remapping devices.
> 
> register_iommu() currently sets a default pgsize behavior, so we can convert
> the IOMMU drivers in subsequent patches. After all the drivers
> are converted, the temporary default settings will be removed.
> 
> Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted
> to deal with bytes instead of page order.
> 
> Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review!
> 
> Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
> Cc: David Brown <davidb@codeaurora.org>
> Cc: David Woodhouse <dwmw2@infradead.org>
> Cc: Joerg Roedel <Joerg.Roedel@amd.com>
> Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
> Cc: KyongHo Cho <pullip.cho@samsung.com>
> Cc: Hiroshi DOYU <hdoyu@nvidia.com>
> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
> Cc: kvm@vger.kernel.org
> ---
>  drivers/iommu/iommu.c      |  124 +++++++++++++++++++++++++++++++++++++++-----
>  drivers/iommu/omap-iovmm.c |   17 ++----
>  include/linux/iommu.h      |   24 +++++++-
>  virt/kvm/iommu.c           |    8 ++--
>  4 files changed, 141 insertions(+), 32 deletions(-)

The patch looks good now. Please implement option a) and it should be
fine. I will test it on an AMD IOMMU platform. We still need someone to
test it on VT-d.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-17  9:28                                   ` Roedel, Joerg
  0 siblings, 0 replies; 106+ messages in thread
From: Roedel, Joerg @ 2011-10-17  9:28 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ohad,

On Mon, Oct 17, 2011 at 04:05:02AM -0400, Ohad Ben-Cohen wrote:

> > This whole thing is quite marginal though and also very easy to change
> > later, so we can start with the "driver-provided io_page_size"
> > approach for now.
> 
> Sorry, I just couldn't get myself to sign-off this as it really feels
> wrong to me.
> 
> This min_pagesz member is just cached by the core so it doesn't need to
> look it up every time we're mapping. Drivers shouldn't care about it, as it's
> completely internal to the iommu core. I'm afraid that pushing this to
> the drivers
> feels like redundant duplication of code and might also confuse developers.
> 
> Let me please suggest two alternatives:
> a) drop this min_pagesz cache completely. iommu core would then redundantly
>   re-calculate this every time something is mapped, but I hardly believe there
>   is going to be a measurable impact on performance.
> b) keep the current implementation for now, and fix this later (when we constify
>   struct iommu_ops *) by caching min_pagesz in a dynamically allocated iommu
>   context. Since this future "constify" patch will anyway need to change 'struct
>   bus_type', it would be a good opportunity to do this change at the same time.
> 
> I don't mind which of those approaches to take, and I also don't mind doing (b)
> myself later, in a separate patch. Your call.

I think option a) is the best. It should add only minimal overhead to
the iommu_map path.

> 
> >> This needs to be (left > 0). The drivers are allowed to unmap more then
> >> requested, so this value may turn negative.
> >
> > Good point. 'left' is size_t though, so i'll fix this a bit differently.
> 
> Fixed, please take a look:
> 
> >From 00b8b9373fe2d73da0280ac1e6ade4a701c95140 Mon Sep 17 00:00:00 2001
> From: Ohad Ben-Cohen <ohad@wizery.com>
> Date: Mon, 10 Oct 2011 23:50:55 +0200
> Subject: [PATCH] iommu/core: split mapping to page sizes as supported
> by the hardware
> 
> When mapping a memory region, split it to page sizes as supported
> by the iommu hardware. Always prefer bigger pages, when possible,
> in order to reduce the TLB pressure.
> 
> The logic to do that is now added to the IOMMU core, so neither the iommu
> drivers themselves nor users of the IOMMU API have to duplicate it.
> 
> This allows a more lenient granularity of mappings; traditionally the
> IOMMU API took 'order' (of a page) as a mapping size, and directly let
> the low level iommu drivers handle the mapping, but now that the IOMMU
> core can split arbitrary memory regions into pages, we can remove this
> limitation, so users don't have to split those regions by themselves.
> 
> Currently the supported page sizes are advertised once and they then
> remain static. That works well for OMAP and MSM but it would probably
> not fly well with intel's hardware, where the page size capabilities
> seem to have the potential to be different between several DMA
> remapping devices.
> 
> register_iommu() currently sets a default pgsize behavior, so we can convert
> the IOMMU drivers in subsequent patches. After all the drivers
> are converted, the temporary default settings will be removed.
> 
> Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted
> to deal with bytes instead of page order.
> 
> Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review!
> 
> Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
> Cc: David Brown <davidb@codeaurora.org>
> Cc: David Woodhouse <dwmw2@infradead.org>
> Cc: Joerg Roedel <Joerg.Roedel@amd.com>
> Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
> Cc: KyongHo Cho <pullip.cho@samsung.com>
> Cc: Hiroshi DOYU <hdoyu@nvidia.com>
> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
> Cc: kvm at vger.kernel.org
> ---
>  drivers/iommu/iommu.c      |  124 +++++++++++++++++++++++++++++++++++++++-----
>  drivers/iommu/omap-iovmm.c |   17 ++----
>  include/linux/iommu.h      |   24 +++++++-
>  virt/kvm/iommu.c           |    8 ++--
>  4 files changed, 141 insertions(+), 32 deletions(-)

The patch looks good now. Please implement option a) and it should be
fine. I will test it on an AMD IOMMU platform. We still need someone to
test it on VT-d.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
  2011-10-17  9:28                                   ` Roedel, Joerg
  (?)
@ 2011-10-17  9:55                                     ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-17  9:55 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: iommu, linux-omap, Laurent Pinchart, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Stepan Moskovchenko, kvm, Hiroshi Doyu

On Mon, Oct 17, 2011 at 11:28 AM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> The patch looks good now. Please implement option a) and it should be
> fine.

Ok, will send in a few moments.

> I will test it on an AMD IOMMU platform. We still need someone to
> test it on VT-d.

That could be great, yeah.

Though by declaring variable-size page sizes support, we intentionally
take a very conservative approach which should effectively make this
whole logic transparent to both of the x86 implementations:

The ->map() handler of both of the AMD driver and the VT-d should be
invoked in the exact way it was invoked earlier (same order, same
parameters), because before this patch-set, all IOMMU users only
mapped page-ordered regions. With this patch-set in hand, the x86
drivers declare support for all possible page-ordered page sizes, so
the IOMMU core should just be a pass-through, and call ->map() with
the exact parameters that were provided to iommu_map().

Similarly, calls to ->unmap() should not change, because current
mainline x86 users (kvm..) only unmap a single page at a time. With
this patch set in hand, the iommu core will just trivially pass those
request through to the driver's ->unmap().

So I guess that if we can test this on AMD and see that indeed this
conservative approach works (the driver gets the same ->map() and
->unmap() calls it got before) then it should work exactly the same
with VT-d.

But having someone with a VT-d setup to test this can be great of course.

Thanks,
Ohad.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-17  9:55                                     ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-17  9:55 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: iommu, linux-omap, Laurent Pinchart, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Stepan Moskovchenko, kvm, Hiroshi Doyu

On Mon, Oct 17, 2011 at 11:28 AM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> The patch looks good now. Please implement option a) and it should be
> fine.

Ok, will send in a few moments.

> I will test it on an AMD IOMMU platform. We still need someone to
> test it on VT-d.

That could be great, yeah.

Though by declaring variable-size page sizes support, we intentionally
take a very conservative approach which should effectively make this
whole logic transparent to both of the x86 implementations:

The ->map() handler of both of the AMD driver and the VT-d should be
invoked in the exact way it was invoked earlier (same order, same
parameters), because before this patch-set, all IOMMU users only
mapped page-ordered regions. With this patch-set in hand, the x86
drivers declare support for all possible page-ordered page sizes, so
the IOMMU core should just be a pass-through, and call ->map() with
the exact parameters that were provided to iommu_map().

Similarly, calls to ->unmap() should not change, because current
mainline x86 users (kvm..) only unmap a single page at a time. With
this patch set in hand, the iommu core will just trivially pass those
request through to the driver's ->unmap().

So I guess that if we can test this on AMD and see that indeed this
conservative approach works (the driver gets the same ->map() and
->unmap() calls it got before) then it should work exactly the same
with VT-d.

But having someone with a VT-d setup to test this can be great of course.

Thanks,
Ohad.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-17  9:55                                     ` Ohad Ben-Cohen
  0 siblings, 0 replies; 106+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-17  9:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 17, 2011 at 11:28 AM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote:
> The patch looks good now. Please implement option a) and it should be
> fine.

Ok, will send in a few moments.

> I will test it on an AMD IOMMU platform. We still need someone to
> test it on VT-d.

That could be great, yeah.

Though by declaring variable-size page sizes support, we intentionally
take a very conservative approach which should effectively make this
whole logic transparent to both of the x86 implementations:

The ->map() handler of both of the AMD driver and the VT-d should be
invoked in the exact way it was invoked earlier (same order, same
parameters), because before this patch-set, all IOMMU users only
mapped page-ordered regions. With this patch-set in hand, the x86
drivers declare support for all possible page-ordered page sizes, so
the IOMMU core should just be a pass-through, and call ->map() with
the exact parameters that were provided to iommu_map().

Similarly, calls to ->unmap() should not change, because current
mainline x86 users (kvm..) only unmap a single page at a time. With
this patch set in hand, the iommu core will just trivially pass those
request through to the driver's ->unmap().

So I guess that if we can test this on AMD and see that indeed this
conservative approach works (the driver gets the same ->map() and
->unmap() calls it got before) then it should work exactly the same
with VT-d.

But having someone with a VT-d setup to test this can be great of course.

Thanks,
Ohad.

^ permalink raw reply	[flat|nested] 106+ messages in thread

end of thread, other threads:[~2011-10-17  9:56 UTC | newest]

Thread overview: 106+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-16 17:51 [PATCH v3 0/6] iommu: split mapping to page sizes as supported by the hardware Ohad Ben-Cohen
2011-09-16 17:51 ` Ohad Ben-Cohen
2011-09-16 17:51 ` Ohad Ben-Cohen
2011-09-16 17:51 ` [PATCH v3 1/6] iommu/core: " Ohad Ben-Cohen
2011-09-16 17:51   ` Ohad Ben-Cohen
2011-09-16 17:51   ` Ohad Ben-Cohen
2011-09-27 10:05   ` Roedel, Joerg
2011-09-27 10:05     ` Roedel, Joerg
2011-09-27 10:05     ` Roedel, Joerg
2011-09-27 12:26     ` Ohad Ben-Cohen
2011-09-27 12:26       ` Ohad Ben-Cohen
2011-09-27 12:26       ` Ohad Ben-Cohen
2011-09-27 13:12       ` Roedel, Joerg
2011-09-27 13:12         ` Roedel, Joerg
2011-09-27 13:12         ` Roedel, Joerg
2011-09-27 13:28         ` Ohad Ben-Cohen
2011-09-27 13:28           ` Ohad Ben-Cohen
2011-09-27 13:28           ` Ohad Ben-Cohen
2011-09-27 18:14           ` Roedel, Joerg
2011-09-27 18:14             ` Roedel, Joerg
2011-09-27 18:14             ` Roedel, Joerg
2011-10-02 15:58             ` Ohad Ben-Cohen
2011-10-02 15:58               ` Ohad Ben-Cohen
2011-10-02 15:58               ` Ohad Ben-Cohen
2011-10-10  7:40               ` Ohad Ben-Cohen
2011-10-10  7:40                 ` Ohad Ben-Cohen
2011-10-10  7:40                 ` Ohad Ben-Cohen
2011-10-10  9:47               ` Roedel, Joerg
2011-10-10  9:47                 ` Roedel, Joerg
2011-10-10  9:47                 ` Roedel, Joerg
2011-10-10 13:59                 ` Ohad Ben-Cohen
2011-10-10 13:59                   ` Ohad Ben-Cohen
2011-10-10 13:59                   ` Ohad Ben-Cohen
2011-10-10 15:36                   ` Roedel, Joerg
2011-10-10 15:36                     ` Roedel, Joerg
2011-10-10 15:36                     ` Roedel, Joerg
2011-10-10 17:02                     ` Ohad Ben-Cohen
2011-10-10 17:02                       ` Ohad Ben-Cohen
2011-10-10 17:02                       ` Ohad Ben-Cohen
2011-10-10 22:01                       ` Ohad Ben-Cohen
2011-10-10 22:01                         ` Ohad Ben-Cohen
2011-10-10 22:01                         ` Ohad Ben-Cohen
2011-10-11 10:21                         ` Roedel, Joerg
2011-10-11 10:21                           ` Roedel, Joerg
2011-10-11 10:21                           ` Roedel, Joerg
2011-10-11 10:19                       ` Roedel, Joerg
2011-10-11 10:19                         ` Roedel, Joerg
2011-10-11 10:19                         ` Roedel, Joerg
2011-10-10 22:49                     ` Ohad Ben-Cohen
2011-10-10 22:49                       ` Ohad Ben-Cohen
2011-10-10 22:49                       ` Ohad Ben-Cohen
2011-10-11 10:38                       ` Roedel, Joerg
2011-10-11 10:38                         ` Roedel, Joerg
2011-10-11 10:38                         ` Roedel, Joerg
2011-10-11 17:01                         ` Ohad Ben-Cohen
2011-10-11 17:01                           ` Ohad Ben-Cohen
2011-10-11 17:01                           ` Ohad Ben-Cohen
2011-10-14 13:35                           ` Roedel, Joerg
2011-10-14 13:35                             ` Roedel, Joerg
2011-10-14 13:35                             ` Roedel, Joerg
2011-10-14 17:03                             ` Ohad Ben-Cohen
2011-10-14 17:03                               ` Ohad Ben-Cohen
2011-10-14 17:03                               ` Ohad Ben-Cohen
2011-10-14 17:05                               ` Ohad Ben-Cohen
2011-10-14 17:05                                 ` Ohad Ben-Cohen
2011-10-14 17:05                                 ` Ohad Ben-Cohen
2011-10-17  8:05                               ` Ohad Ben-Cohen
2011-10-17  8:05                                 ` Ohad Ben-Cohen
2011-10-17  8:05                                 ` Ohad Ben-Cohen
2011-10-17  9:28                                 ` Roedel, Joerg
2011-10-17  9:28                                   ` Roedel, Joerg
2011-10-17  9:28                                   ` Roedel, Joerg
2011-10-17  9:55                                   ` Ohad Ben-Cohen
2011-10-17  9:55                                     ` Ohad Ben-Cohen
2011-10-17  9:55                                     ` Ohad Ben-Cohen
2011-10-11 14:59                       ` KyongHo Cho
2011-10-11 14:59                         ` KyongHo Cho
2011-10-11 14:59                         ` KyongHo Cho
2011-10-11 17:04                         ` Ohad Ben-Cohen
2011-10-11 17:04                           ` Ohad Ben-Cohen
2011-10-11 17:04                           ` Ohad Ben-Cohen
2011-10-10 12:52               ` KyongHo Cho
2011-10-10 12:52                 ` KyongHo Cho
2011-10-10 12:52                 ` KyongHo Cho
2011-10-10 14:00                 ` Ohad Ben-Cohen
2011-10-10 14:00                   ` Ohad Ben-Cohen
2011-10-10 14:00                   ` Ohad Ben-Cohen
2011-09-16 17:51 ` [PATCH v3 2/6] iommu/omap: announce supported page sizes Ohad Ben-Cohen
2011-09-16 17:51   ` Ohad Ben-Cohen
2011-09-16 17:51   ` Ohad Ben-Cohen
2011-09-16 17:51 ` [PATCH v3 3/6] iommu/msm: " Ohad Ben-Cohen
2011-09-16 17:51   ` Ohad Ben-Cohen
2011-09-16 17:51   ` Ohad Ben-Cohen
2011-09-25  5:03   ` David Brown
2011-09-25  5:03     ` David Brown
2011-09-16 17:51 ` [PATCH v3 4/6] iommu/amd: " Ohad Ben-Cohen
2011-09-16 17:51   ` Ohad Ben-Cohen
2011-09-16 17:51   ` Ohad Ben-Cohen
2011-09-16 17:51 ` [PATCH v3 5/6] iommu/intel: " Ohad Ben-Cohen
2011-09-16 17:51   ` Ohad Ben-Cohen
2011-09-16 17:51   ` Ohad Ben-Cohen
2011-09-16 17:51 ` [PATCH v3 6/6] iommu/core: remove the temporary register_iommu_pgsize API Ohad Ben-Cohen
2011-09-16 17:51   ` Ohad Ben-Cohen
2011-09-16 17:51   ` Ohad Ben-Cohen
2011-09-27  8:56 ` [PATCH v3 0/6] iommu: split mapping to page sizes as supported by the hardware Ohad Ben-Cohen
2011-09-27  8:56   ` Ohad Ben-Cohen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.