[PATCH v4 0/7] iommu: split mapping to page sizes as supported by the hardware

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v4 0/7] iommu: split mapping to page sizes as supported by the hardware
@ 2011-10-17 11:27 ` Ohad Ben-Cohen
  0 siblings, 0 replies; 81+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-17 11:27 UTC (permalink / raw)
  To: iommu
  Cc: linux-omap, Laurent Pinchart, Joerg Roedel, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho, Ohad Ben-Cohen

v3->v4:
- simplify splitter logic (Joerg)
- declare supported page-sizes in the iommu_ops, without extending iommu_register (Joerg)
- iommu_unmap should now return bytes too (Joerg)
- don't cache min_pgsize anymore (Joerg)
- handle cases when ->unmap() actually unmaps more than requested (Joerg)
- unroll iommu_unmap completely in case it fails (KyongHo)
- use size_t for size parameters (KyongHo)
- add a patch to remove the bytes->order->bytes conversion we had
- rabase to master branch of the iommu tree (Joerg)

v2->v3:
- s/KB/KiB/ (David W)

v1->v2:
- split to patches (by keeping the old code around until all drivers are converted) (Joerg)

Tested with OMAP3 (omap3isp) and OMAP4 (rpmsg/remoteproc).
Compile tested with X86_64.

Ohad Ben-Cohen (7):
  iommu/core: stop converting bytes to page order back and forth
  iommu/core: split mapping to page sizes as supported by the hardware
  iommu/omap: announce supported page sizes
  iommu/msm: announce supported page sizes
  iommu/amd: announce supported page sizes
  iommu/intel: announce supported page sizes
  iommu/core: remove the temporary pgsize settings

 drivers/iommu/amd_iommu.c   |   32 +++++++++---
 drivers/iommu/intel-iommu.c |   30 ++++++++---
 drivers/iommu/iommu.c       |  119 ++++++++++++++++++++++++++++++++++++++----
 drivers/iommu/msm_iommu.c   |   25 ++++-----
 drivers/iommu/omap-iommu.c  |   18 +++---
 drivers/iommu/omap-iovmm.c  |   17 ++----
 include/linux/iommu.h       |   26 +++++++--
 virt/kvm/iommu.c            |    8 ++--
 8 files changed, 205 insertions(+), 70 deletions(-)

-- 
1.7.4.1


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v4 0/7] iommu: split mapping to page sizes as supported by the hardware
@ 2011-10-17 11:27 ` Ohad Ben-Cohen
  0 siblings, 0 replies; 81+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-17 11:27 UTC (permalink / raw)
  To: iommu
  Cc: linux-omap, Laurent Pinchart, Joerg Roedel, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho, Ohad Ben-Cohen

v3->v4:
- simplify splitter logic (Joerg)
- declare supported page-sizes in the iommu_ops, without extending iommu_register (Joerg)
- iommu_unmap should now return bytes too (Joerg)
- don't cache min_pgsize anymore (Joerg)
- handle cases when ->unmap() actually unmaps more than requested (Joerg)
- unroll iommu_unmap completely in case it fails (KyongHo)
- use size_t for size parameters (KyongHo)
- add a patch to remove the bytes->order->bytes conversion we had
- rabase to master branch of the iommu tree (Joerg)

v2->v3:
- s/KB/KiB/ (David W)

v1->v2:
- split to patches (by keeping the old code around until all drivers are converted) (Joerg)

Tested with OMAP3 (omap3isp) and OMAP4 (rpmsg/remoteproc).
Compile tested with X86_64.

Ohad Ben-Cohen (7):
  iommu/core: stop converting bytes to page order back and forth
  iommu/core: split mapping to page sizes as supported by the hardware
  iommu/omap: announce supported page sizes
  iommu/msm: announce supported page sizes
  iommu/amd: announce supported page sizes
  iommu/intel: announce supported page sizes
  iommu/core: remove the temporary pgsize settings

 drivers/iommu/amd_iommu.c   |   32 +++++++++---
 drivers/iommu/intel-iommu.c |   30 ++++++++---
 drivers/iommu/iommu.c       |  119 ++++++++++++++++++++++++++++++++++++++----
 drivers/iommu/msm_iommu.c   |   25 ++++-----
 drivers/iommu/omap-iommu.c  |   18 +++---
 drivers/iommu/omap-iovmm.c  |   17 ++----
 include/linux/iommu.h       |   26 +++++++--
 virt/kvm/iommu.c            |    8 ++--
 8 files changed, 205 insertions(+), 70 deletions(-)

-- 
1.7.4.1

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v4 0/7] iommu: split mapping to page sizes as supported by the hardware
@ 2011-10-17 11:27 ` Ohad Ben-Cohen
  0 siblings, 0 replies; 81+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-17 11:27 UTC (permalink / raw)
  To: linux-arm-kernel

v3->v4:
- simplify splitter logic (Joerg)
- declare supported page-sizes in the iommu_ops, without extending iommu_register (Joerg)
- iommu_unmap should now return bytes too (Joerg)
- don't cache min_pgsize anymore (Joerg)
- handle cases when ->unmap() actually unmaps more than requested (Joerg)
- unroll iommu_unmap completely in case it fails (KyongHo)
- use size_t for size parameters (KyongHo)
- add a patch to remove the bytes->order->bytes conversion we had
- rabase to master branch of the iommu tree (Joerg)

v2->v3:
- s/KB/KiB/ (David W)

v1->v2:
- split to patches (by keeping the old code around until all drivers are converted) (Joerg)

Tested with OMAP3 (omap3isp) and OMAP4 (rpmsg/remoteproc).
Compile tested with X86_64.

Ohad Ben-Cohen (7):
  iommu/core: stop converting bytes to page order back and forth
  iommu/core: split mapping to page sizes as supported by the hardware
  iommu/omap: announce supported page sizes
  iommu/msm: announce supported page sizes
  iommu/amd: announce supported page sizes
  iommu/intel: announce supported page sizes
  iommu/core: remove the temporary pgsize settings

 drivers/iommu/amd_iommu.c   |   32 +++++++++---
 drivers/iommu/intel-iommu.c |   30 ++++++++---
 drivers/iommu/iommu.c       |  119 ++++++++++++++++++++++++++++++++++++++----
 drivers/iommu/msm_iommu.c   |   25 ++++-----
 drivers/iommu/omap-iommu.c  |   18 +++---
 drivers/iommu/omap-iovmm.c  |   17 ++----
 include/linux/iommu.h       |   26 +++++++--
 virt/kvm/iommu.c            |    8 ++--
 8 files changed, 205 insertions(+), 70 deletions(-)

-- 
1.7.4.1

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v4 1/7] iommu/core: stop converting bytes to page order back and forth
  2011-10-17 11:27 ` Ohad Ben-Cohen
  (?)
@ 2011-10-17 11:27   ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 81+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-17 11:27 UTC (permalink / raw)
  To: iommu
  Cc: linux-omap, Laurent Pinchart, Joerg Roedel, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho, Ohad Ben-Cohen

Express sizes in bytes rather than in page order, to eliminate the
size->order->size conversions we have whenever the IOMMU API is calling
the low level drivers' map/unmap methods.

Adopt all existing drivers.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: David Brown <davidb@codeaurora.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <Joerg.Roedel@amd.com>
Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
Cc: KyongHo Cho <pullip.cho@samsung.com>
Cc: Hiroshi DOYU <hdoyu@nvidia.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
---
 drivers/iommu/amd_iommu.c   |   13 +++++--------
 drivers/iommu/intel-iommu.c |   11 ++++-------
 drivers/iommu/iommu.c       |    8 +++++---
 drivers/iommu/msm_iommu.c   |   19 +++++++------------
 drivers/iommu/omap-iommu.c  |   14 +++++---------
 include/linux/iommu.h       |    6 +++---
 6 files changed, 29 insertions(+), 42 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 764e3da..32d502e 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2702,9 +2702,8 @@ static int amd_iommu_attach_device(struct iommu_domain *dom,
 }
 
 static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova,
-			 phys_addr_t paddr, int gfp_order, int iommu_prot)
+			 phys_addr_t paddr, size_t page_size, int iommu_prot)
 {
-	unsigned long page_size = 0x1000UL << gfp_order;
 	struct protection_domain *domain = dom->priv;
 	int prot = 0;
 	int ret;
@@ -2721,13 +2720,11 @@ static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova,
 	return ret;
 }
 
-static int amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova,
-			   int gfp_order)
+static size_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova,
+			   size_t page_size)
 {
 	struct protection_domain *domain = dom->priv;
-	unsigned long page_size, unmap_size;
-
-	page_size  = 0x1000UL << gfp_order;
+	size_t unmap_size;
 
 	mutex_lock(&domain->api_lock);
 	unmap_size = iommu_unmap_page(domain, iova, page_size);
@@ -2735,7 +2732,7 @@ static int amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova,
 
 	domain_flush_tlb_pde(domain);
 
-	return get_order(unmap_size);
+	return unmap_size;
 }
 
 static phys_addr_t amd_iommu_iova_to_phys(struct iommu_domain *dom,
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 2d53c3d..96a9bd4 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3821,12 +3821,11 @@ static void intel_iommu_detach_device(struct iommu_domain *domain,
 
 static int intel_iommu_map(struct iommu_domain *domain,
 			   unsigned long iova, phys_addr_t hpa,
-			   int gfp_order, int iommu_prot)
+			   size_t size, int iommu_prot)
 {
 	struct dmar_domain *dmar_domain = domain->priv;
 	u64 max_addr;
 	int prot = 0;
-	size_t size;
 	int ret;
 
 	if (iommu_prot & IOMMU_READ)
@@ -3836,7 +3835,6 @@ static int intel_iommu_map(struct iommu_domain *domain,
 	if ((iommu_prot & IOMMU_CACHE) && dmar_domain->iommu_snooping)
 		prot |= DMA_PTE_SNP;
 
-	size     = PAGE_SIZE << gfp_order;
 	max_addr = iova + size;
 	if (dmar_domain->max_addr < max_addr) {
 		u64 end;
@@ -3859,11 +3857,10 @@ static int intel_iommu_map(struct iommu_domain *domain,
 	return ret;
 }
 
-static int intel_iommu_unmap(struct iommu_domain *domain,
-			     unsigned long iova, int gfp_order)
+static size_t intel_iommu_unmap(struct iommu_domain *domain,
+			     unsigned long iova, size_t size)
 {
 	struct dmar_domain *dmar_domain = domain->priv;
-	size_t size = PAGE_SIZE << gfp_order;
 
 	dma_pte_clear_range(dmar_domain, iova >> VTD_PAGE_SHIFT,
 			    (iova + size - 1) >> VTD_PAGE_SHIFT);
@@ -3871,7 +3868,7 @@ static int intel_iommu_unmap(struct iommu_domain *domain,
 	if (dmar_domain->max_addr == iova + size)
 		dmar_domain->max_addr = iova;
 
-	return gfp_order;
+	return size;
 }
 
 static phys_addr_t intel_iommu_iova_to_phys(struct iommu_domain *domain,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 63ca9ba..5d042e8 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -168,13 +168,13 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova,
 
 	BUG_ON(!IS_ALIGNED(iova | paddr, size));
 
-	return domain->ops->map(domain, iova, paddr, gfp_order, prot);
+	return domain->ops->map(domain, iova, paddr, size, prot);
 }
 EXPORT_SYMBOL_GPL(iommu_map);
 
 int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
 {
-	size_t size;
+	size_t size, unmapped;
 
 	if (unlikely(domain->ops->unmap == NULL))
 		return -ENODEV;
@@ -183,7 +183,9 @@ int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
 
 	BUG_ON(!IS_ALIGNED(iova, size));
 
-	return domain->ops->unmap(domain, iova, gfp_order);
+	unmapped = domain->ops->unmap(domain, iova, size);
+
+	return get_order(unmapped);
 }
 EXPORT_SYMBOL_GPL(iommu_unmap);
 
diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index 5865dd2..13718d9 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -352,7 +352,7 @@ fail:
 }
 
 static int msm_iommu_map(struct iommu_domain *domain, unsigned long va,
-			 phys_addr_t pa, int order, int prot)
+			 phys_addr_t pa, size_t len, int prot)
 {
 	struct msm_priv *priv;
 	unsigned long flags;
@@ -363,7 +363,6 @@ static int msm_iommu_map(struct iommu_domain *domain, unsigned long va,
 	unsigned long *sl_pte;
 	unsigned long sl_offset;
 	unsigned int pgprot;
-	size_t len = 0x1000UL << order;
 	int ret = 0, tex, sh;
 
 	spin_lock_irqsave(&msm_iommu_lock, flags);
@@ -463,8 +462,8 @@ fail:
 	return ret;
 }
 
-static int msm_iommu_unmap(struct iommu_domain *domain, unsigned long va,
-			    int order)
+static size_t msm_iommu_unmap(struct iommu_domain *domain, unsigned long va,
+			    size_t len)
 {
 	struct msm_priv *priv;
 	unsigned long flags;
@@ -474,7 +473,6 @@ static int msm_iommu_unmap(struct iommu_domain *domain, unsigned long va,
 	unsigned long *sl_table;
 	unsigned long *sl_pte;
 	unsigned long sl_offset;
-	size_t len = 0x1000UL << order;
 	int i, ret = 0;
 
 	spin_lock_irqsave(&msm_iommu_lock, flags);
@@ -544,15 +542,12 @@ static int msm_iommu_unmap(struct iommu_domain *domain, unsigned long va,
 
 	ret = __flush_iotlb(domain);
 
-	/*
-	 * the IOMMU API requires us to return the order of the unmapped
-	 * page (on success).
-	 */
-	if (!ret)
-		ret = order;
 fail:
 	spin_unlock_irqrestore(&msm_iommu_lock, flags);
-	return ret;
+
+	/* the IOMMU API requires us to return how many bytes were unmapped */
+	len = ret ? 0 : len;
+	return len;
 }
 
 static phys_addr_t msm_iommu_iova_to_phys(struct iommu_domain *domain,
diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index 9ee2c38..1bb2971 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -1019,12 +1019,11 @@ static void iopte_cachep_ctor(void *iopte)
 }
 
 static int omap_iommu_map(struct iommu_domain *domain, unsigned long da,
-			 phys_addr_t pa, int order, int prot)
+			 phys_addr_t pa, size_t bytes, int prot)
 {
 	struct omap_iommu_domain *omap_domain = domain->priv;
 	struct omap_iommu *oiommu = omap_domain->iommu_dev;
 	struct device *dev = oiommu->dev;
-	size_t bytes = PAGE_SIZE << order;
 	struct iotlb_entry e;
 	int omap_pgsz;
 	u32 ret, flags;
@@ -1049,19 +1048,16 @@ static int omap_iommu_map(struct iommu_domain *domain, unsigned long da,
 	return ret;
 }
 
-static int omap_iommu_unmap(struct iommu_domain *domain, unsigned long da,
-			    int order)
+static size_t omap_iommu_unmap(struct iommu_domain *domain, unsigned long da,
+			    size_t size)
 {
 	struct omap_iommu_domain *omap_domain = domain->priv;
 	struct omap_iommu *oiommu = omap_domain->iommu_dev;
 	struct device *dev = oiommu->dev;
-	size_t unmap_size;
 
-	dev_dbg(dev, "unmapping da 0x%lx order %d\n", da, order);
+	dev_dbg(dev, "unmapping da 0x%lx size %u\n", da, size);
 
-	unmap_size = iopgtable_clear_entry(oiommu, da);
-
-	return unmap_size ? get_order(unmap_size) : -EINVAL;
+	return iopgtable_clear_entry(oiommu, da);
 }
 
 static int
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 710291f..6b6ed21 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -54,9 +54,9 @@ struct iommu_ops {
 	int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
 	void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
 	int (*map)(struct iommu_domain *domain, unsigned long iova,
-		   phys_addr_t paddr, int gfp_order, int prot);
-	int (*unmap)(struct iommu_domain *domain, unsigned long iova,
-		     int gfp_order);
+		   phys_addr_t paddr, size_t size, int prot);
+	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
+		     size_t size);
 	phys_addr_t (*iova_to_phys)(struct iommu_domain *domain,
 				    unsigned long iova);
 	int (*domain_has_cap)(struct iommu_domain *domain,
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v4 1/7] iommu/core: stop converting bytes to page order back and forth
@ 2011-10-17 11:27   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 81+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-17 11:27 UTC (permalink / raw)
  To: iommu
  Cc: linux-omap, Laurent Pinchart, Joerg Roedel, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho, Ohad Ben-Cohen

Express sizes in bytes rather than in page order, to eliminate the
size->order->size conversions we have whenever the IOMMU API is calling
the low level drivers' map/unmap methods.

Adopt all existing drivers.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: David Brown <davidb@codeaurora.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <Joerg.Roedel@amd.com>
Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
Cc: KyongHo Cho <pullip.cho@samsung.com>
Cc: Hiroshi DOYU <hdoyu@nvidia.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
---
 drivers/iommu/amd_iommu.c   |   13 +++++--------
 drivers/iommu/intel-iommu.c |   11 ++++-------
 drivers/iommu/iommu.c       |    8 +++++---
 drivers/iommu/msm_iommu.c   |   19 +++++++------------
 drivers/iommu/omap-iommu.c  |   14 +++++---------
 include/linux/iommu.h       |    6 +++---
 6 files changed, 29 insertions(+), 42 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 764e3da..32d502e 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2702,9 +2702,8 @@ static int amd_iommu_attach_device(struct iommu_domain *dom,
 }
 
 static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova,
-			 phys_addr_t paddr, int gfp_order, int iommu_prot)
+			 phys_addr_t paddr, size_t page_size, int iommu_prot)
 {
-	unsigned long page_size = 0x1000UL << gfp_order;
 	struct protection_domain *domain = dom->priv;
 	int prot = 0;
 	int ret;
@@ -2721,13 +2720,11 @@ static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova,
 	return ret;
 }
 
-static int amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova,
-			   int gfp_order)
+static size_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova,
+			   size_t page_size)
 {
 	struct protection_domain *domain = dom->priv;
-	unsigned long page_size, unmap_size;
-
-	page_size  = 0x1000UL << gfp_order;
+	size_t unmap_size;
 
 	mutex_lock(&domain->api_lock);
 	unmap_size = iommu_unmap_page(domain, iova, page_size);
@@ -2735,7 +2732,7 @@ static int amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova,
 
 	domain_flush_tlb_pde(domain);
 
-	return get_order(unmap_size);
+	return unmap_size;
 }
 
 static phys_addr_t amd_iommu_iova_to_phys(struct iommu_domain *dom,
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 2d53c3d..96a9bd4 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3821,12 +3821,11 @@ static void intel_iommu_detach_device(struct iommu_domain *domain,
 
 static int intel_iommu_map(struct iommu_domain *domain,
 			   unsigned long iova, phys_addr_t hpa,
-			   int gfp_order, int iommu_prot)
+			   size_t size, int iommu_prot)
 {
 	struct dmar_domain *dmar_domain = domain->priv;
 	u64 max_addr;
 	int prot = 0;
-	size_t size;
 	int ret;
 
 	if (iommu_prot & IOMMU_READ)
@@ -3836,7 +3835,6 @@ static int intel_iommu_map(struct iommu_domain *domain,
 	if ((iommu_prot & IOMMU_CACHE) && dmar_domain->iommu_snooping)
 		prot |= DMA_PTE_SNP;
 
-	size     = PAGE_SIZE << gfp_order;
 	max_addr = iova + size;
 	if (dmar_domain->max_addr < max_addr) {
 		u64 end;
@@ -3859,11 +3857,10 @@ static int intel_iommu_map(struct iommu_domain *domain,
 	return ret;
 }
 
-static int intel_iommu_unmap(struct iommu_domain *domain,
-			     unsigned long iova, int gfp_order)
+static size_t intel_iommu_unmap(struct iommu_domain *domain,
+			     unsigned long iova, size_t size)
 {
 	struct dmar_domain *dmar_domain = domain->priv;
-	size_t size = PAGE_SIZE << gfp_order;
 
 	dma_pte_clear_range(dmar_domain, iova >> VTD_PAGE_SHIFT,
 			    (iova + size - 1) >> VTD_PAGE_SHIFT);
@@ -3871,7 +3868,7 @@ static int intel_iommu_unmap(struct iommu_domain *domain,
 	if (dmar_domain->max_addr == iova + size)
 		dmar_domain->max_addr = iova;
 
-	return gfp_order;
+	return size;
 }
 
 static phys_addr_t intel_iommu_iova_to_phys(struct iommu_domain *domain,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 63ca9ba..5d042e8 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -168,13 +168,13 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova,
 
 	BUG_ON(!IS_ALIGNED(iova | paddr, size));
 
-	return domain->ops->map(domain, iova, paddr, gfp_order, prot);
+	return domain->ops->map(domain, iova, paddr, size, prot);
 }
 EXPORT_SYMBOL_GPL(iommu_map);
 
 int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
 {
-	size_t size;
+	size_t size, unmapped;
 
 	if (unlikely(domain->ops->unmap == NULL))
 		return -ENODEV;
@@ -183,7 +183,9 @@ int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
 
 	BUG_ON(!IS_ALIGNED(iova, size));
 
-	return domain->ops->unmap(domain, iova, gfp_order);
+	unmapped = domain->ops->unmap(domain, iova, size);
+
+	return get_order(unmapped);
 }
 EXPORT_SYMBOL_GPL(iommu_unmap);
 
diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index 5865dd2..13718d9 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -352,7 +352,7 @@ fail:
 }
 
 static int msm_iommu_map(struct iommu_domain *domain, unsigned long va,
-			 phys_addr_t pa, int order, int prot)
+			 phys_addr_t pa, size_t len, int prot)
 {
 	struct msm_priv *priv;
 	unsigned long flags;
@@ -363,7 +363,6 @@ static int msm_iommu_map(struct iommu_domain *domain, unsigned long va,
 	unsigned long *sl_pte;
 	unsigned long sl_offset;
 	unsigned int pgprot;
-	size_t len = 0x1000UL << order;
 	int ret = 0, tex, sh;
 
 	spin_lock_irqsave(&msm_iommu_lock, flags);
@@ -463,8 +462,8 @@ fail:
 	return ret;
 }
 
-static int msm_iommu_unmap(struct iommu_domain *domain, unsigned long va,
-			    int order)
+static size_t msm_iommu_unmap(struct iommu_domain *domain, unsigned long va,
+			    size_t len)
 {
 	struct msm_priv *priv;
 	unsigned long flags;
@@ -474,7 +473,6 @@ static int msm_iommu_unmap(struct iommu_domain *domain, unsigned long va,
 	unsigned long *sl_table;
 	unsigned long *sl_pte;
 	unsigned long sl_offset;
-	size_t len = 0x1000UL << order;
 	int i, ret = 0;
 
 	spin_lock_irqsave(&msm_iommu_lock, flags);
@@ -544,15 +542,12 @@ static int msm_iommu_unmap(struct iommu_domain *domain, unsigned long va,
 
 	ret = __flush_iotlb(domain);
 
-	/*
-	 * the IOMMU API requires us to return the order of the unmapped
-	 * page (on success).
-	 */
-	if (!ret)
-		ret = order;
 fail:
 	spin_unlock_irqrestore(&msm_iommu_lock, flags);
-	return ret;
+
+	/* the IOMMU API requires us to return how many bytes were unmapped */
+	len = ret ? 0 : len;
+	return len;
 }
 
 static phys_addr_t msm_iommu_iova_to_phys(struct iommu_domain *domain,
diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index 9ee2c38..1bb2971 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -1019,12 +1019,11 @@ static void iopte_cachep_ctor(void *iopte)
 }
 
 static int omap_iommu_map(struct iommu_domain *domain, unsigned long da,
-			 phys_addr_t pa, int order, int prot)
+			 phys_addr_t pa, size_t bytes, int prot)
 {
 	struct omap_iommu_domain *omap_domain = domain->priv;
 	struct omap_iommu *oiommu = omap_domain->iommu_dev;
 	struct device *dev = oiommu->dev;
-	size_t bytes = PAGE_SIZE << order;
 	struct iotlb_entry e;
 	int omap_pgsz;
 	u32 ret, flags;
@@ -1049,19 +1048,16 @@ static int omap_iommu_map(struct iommu_domain *domain, unsigned long da,
 	return ret;
 }
 
-static int omap_iommu_unmap(struct iommu_domain *domain, unsigned long da,
-			    int order)
+static size_t omap_iommu_unmap(struct iommu_domain *domain, unsigned long da,
+			    size_t size)
 {
 	struct omap_iommu_domain *omap_domain = domain->priv;
 	struct omap_iommu *oiommu = omap_domain->iommu_dev;
 	struct device *dev = oiommu->dev;
-	size_t unmap_size;
 
-	dev_dbg(dev, "unmapping da 0x%lx order %d\n", da, order);
+	dev_dbg(dev, "unmapping da 0x%lx size %u\n", da, size);
 
-	unmap_size = iopgtable_clear_entry(oiommu, da);
-
-	return unmap_size ? get_order(unmap_size) : -EINVAL;
+	return iopgtable_clear_entry(oiommu, da);
 }
 
 static int
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 710291f..6b6ed21 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -54,9 +54,9 @@ struct iommu_ops {
 	int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
 	void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
 	int (*map)(struct iommu_domain *domain, unsigned long iova,
-		   phys_addr_t paddr, int gfp_order, int prot);
-	int (*unmap)(struct iommu_domain *domain, unsigned long iova,
-		     int gfp_order);
+		   phys_addr_t paddr, size_t size, int prot);
+	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
+		     size_t size);
 	phys_addr_t (*iova_to_phys)(struct iommu_domain *domain,
 				    unsigned long iova);
 	int (*domain_has_cap)(struct iommu_domain *domain,
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v4 1/7] iommu/core: stop converting bytes to page order back and forth
@ 2011-10-17 11:27   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 81+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-17 11:27 UTC (permalink / raw)
  To: linux-arm-kernel

Express sizes in bytes rather than in page order, to eliminate the
size->order->size conversions we have whenever the IOMMU API is calling
the low level drivers' map/unmap methods.

Adopt all existing drivers.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: David Brown <davidb@codeaurora.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <Joerg.Roedel@amd.com>
Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
Cc: KyongHo Cho <pullip.cho@samsung.com>
Cc: Hiroshi DOYU <hdoyu@nvidia.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
---
 drivers/iommu/amd_iommu.c   |   13 +++++--------
 drivers/iommu/intel-iommu.c |   11 ++++-------
 drivers/iommu/iommu.c       |    8 +++++---
 drivers/iommu/msm_iommu.c   |   19 +++++++------------
 drivers/iommu/omap-iommu.c  |   14 +++++---------
 include/linux/iommu.h       |    6 +++---
 6 files changed, 29 insertions(+), 42 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 764e3da..32d502e 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2702,9 +2702,8 @@ static int amd_iommu_attach_device(struct iommu_domain *dom,
 }
 
 static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova,
-			 phys_addr_t paddr, int gfp_order, int iommu_prot)
+			 phys_addr_t paddr, size_t page_size, int iommu_prot)
 {
-	unsigned long page_size = 0x1000UL << gfp_order;
 	struct protection_domain *domain = dom->priv;
 	int prot = 0;
 	int ret;
@@ -2721,13 +2720,11 @@ static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova,
 	return ret;
 }
 
-static int amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova,
-			   int gfp_order)
+static size_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova,
+			   size_t page_size)
 {
 	struct protection_domain *domain = dom->priv;
-	unsigned long page_size, unmap_size;
-
-	page_size  = 0x1000UL << gfp_order;
+	size_t unmap_size;
 
 	mutex_lock(&domain->api_lock);
 	unmap_size = iommu_unmap_page(domain, iova, page_size);
@@ -2735,7 +2732,7 @@ static int amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova,
 
 	domain_flush_tlb_pde(domain);
 
-	return get_order(unmap_size);
+	return unmap_size;
 }
 
 static phys_addr_t amd_iommu_iova_to_phys(struct iommu_domain *dom,
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 2d53c3d..96a9bd4 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3821,12 +3821,11 @@ static void intel_iommu_detach_device(struct iommu_domain *domain,
 
 static int intel_iommu_map(struct iommu_domain *domain,
 			   unsigned long iova, phys_addr_t hpa,
-			   int gfp_order, int iommu_prot)
+			   size_t size, int iommu_prot)
 {
 	struct dmar_domain *dmar_domain = domain->priv;
 	u64 max_addr;
 	int prot = 0;
-	size_t size;
 	int ret;
 
 	if (iommu_prot & IOMMU_READ)
@@ -3836,7 +3835,6 @@ static int intel_iommu_map(struct iommu_domain *domain,
 	if ((iommu_prot & IOMMU_CACHE) && dmar_domain->iommu_snooping)
 		prot |= DMA_PTE_SNP;
 
-	size     = PAGE_SIZE << gfp_order;
 	max_addr = iova + size;
 	if (dmar_domain->max_addr < max_addr) {
 		u64 end;
@@ -3859,11 +3857,10 @@ static int intel_iommu_map(struct iommu_domain *domain,
 	return ret;
 }
 
-static int intel_iommu_unmap(struct iommu_domain *domain,
-			     unsigned long iova, int gfp_order)
+static size_t intel_iommu_unmap(struct iommu_domain *domain,
+			     unsigned long iova, size_t size)
 {
 	struct dmar_domain *dmar_domain = domain->priv;
-	size_t size = PAGE_SIZE << gfp_order;
 
 	dma_pte_clear_range(dmar_domain, iova >> VTD_PAGE_SHIFT,
 			    (iova + size - 1) >> VTD_PAGE_SHIFT);
@@ -3871,7 +3868,7 @@ static int intel_iommu_unmap(struct iommu_domain *domain,
 	if (dmar_domain->max_addr == iova + size)
 		dmar_domain->max_addr = iova;
 
-	return gfp_order;
+	return size;
 }
 
 static phys_addr_t intel_iommu_iova_to_phys(struct iommu_domain *domain,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 63ca9ba..5d042e8 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -168,13 +168,13 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova,
 
 	BUG_ON(!IS_ALIGNED(iova | paddr, size));
 
-	return domain->ops->map(domain, iova, paddr, gfp_order, prot);
+	return domain->ops->map(domain, iova, paddr, size, prot);
 }
 EXPORT_SYMBOL_GPL(iommu_map);
 
 int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
 {
-	size_t size;
+	size_t size, unmapped;
 
 	if (unlikely(domain->ops->unmap == NULL))
 		return -ENODEV;
@@ -183,7 +183,9 @@ int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
 
 	BUG_ON(!IS_ALIGNED(iova, size));
 
-	return domain->ops->unmap(domain, iova, gfp_order);
+	unmapped = domain->ops->unmap(domain, iova, size);
+
+	return get_order(unmapped);
 }
 EXPORT_SYMBOL_GPL(iommu_unmap);
 
diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index 5865dd2..13718d9 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -352,7 +352,7 @@ fail:
 }
 
 static int msm_iommu_map(struct iommu_domain *domain, unsigned long va,
-			 phys_addr_t pa, int order, int prot)
+			 phys_addr_t pa, size_t len, int prot)
 {
 	struct msm_priv *priv;
 	unsigned long flags;
@@ -363,7 +363,6 @@ static int msm_iommu_map(struct iommu_domain *domain, unsigned long va,
 	unsigned long *sl_pte;
 	unsigned long sl_offset;
 	unsigned int pgprot;
-	size_t len = 0x1000UL << order;
 	int ret = 0, tex, sh;
 
 	spin_lock_irqsave(&msm_iommu_lock, flags);
@@ -463,8 +462,8 @@ fail:
 	return ret;
 }
 
-static int msm_iommu_unmap(struct iommu_domain *domain, unsigned long va,
-			    int order)
+static size_t msm_iommu_unmap(struct iommu_domain *domain, unsigned long va,
+			    size_t len)
 {
 	struct msm_priv *priv;
 	unsigned long flags;
@@ -474,7 +473,6 @@ static int msm_iommu_unmap(struct iommu_domain *domain, unsigned long va,
 	unsigned long *sl_table;
 	unsigned long *sl_pte;
 	unsigned long sl_offset;
-	size_t len = 0x1000UL << order;
 	int i, ret = 0;
 
 	spin_lock_irqsave(&msm_iommu_lock, flags);
@@ -544,15 +542,12 @@ static int msm_iommu_unmap(struct iommu_domain *domain, unsigned long va,
 
 	ret = __flush_iotlb(domain);
 
-	/*
-	 * the IOMMU API requires us to return the order of the unmapped
-	 * page (on success).
-	 */
-	if (!ret)
-		ret = order;
 fail:
 	spin_unlock_irqrestore(&msm_iommu_lock, flags);
-	return ret;
+
+	/* the IOMMU API requires us to return how many bytes were unmapped */
+	len = ret ? 0 : len;
+	return len;
 }
 
 static phys_addr_t msm_iommu_iova_to_phys(struct iommu_domain *domain,
diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index 9ee2c38..1bb2971 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -1019,12 +1019,11 @@ static void iopte_cachep_ctor(void *iopte)
 }
 
 static int omap_iommu_map(struct iommu_domain *domain, unsigned long da,
-			 phys_addr_t pa, int order, int prot)
+			 phys_addr_t pa, size_t bytes, int prot)
 {
 	struct omap_iommu_domain *omap_domain = domain->priv;
 	struct omap_iommu *oiommu = omap_domain->iommu_dev;
 	struct device *dev = oiommu->dev;
-	size_t bytes = PAGE_SIZE << order;
 	struct iotlb_entry e;
 	int omap_pgsz;
 	u32 ret, flags;
@@ -1049,19 +1048,16 @@ static int omap_iommu_map(struct iommu_domain *domain, unsigned long da,
 	return ret;
 }
 
-static int omap_iommu_unmap(struct iommu_domain *domain, unsigned long da,
-			    int order)
+static size_t omap_iommu_unmap(struct iommu_domain *domain, unsigned long da,
+			    size_t size)
 {
 	struct omap_iommu_domain *omap_domain = domain->priv;
 	struct omap_iommu *oiommu = omap_domain->iommu_dev;
 	struct device *dev = oiommu->dev;
-	size_t unmap_size;
 
-	dev_dbg(dev, "unmapping da 0x%lx order %d\n", da, order);
+	dev_dbg(dev, "unmapping da 0x%lx size %u\n", da, size);
 
-	unmap_size = iopgtable_clear_entry(oiommu, da);
-
-	return unmap_size ? get_order(unmap_size) : -EINVAL;
+	return iopgtable_clear_entry(oiommu, da);
 }
 
 static int
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 710291f..6b6ed21 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -54,9 +54,9 @@ struct iommu_ops {
 	int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
 	void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
 	int (*map)(struct iommu_domain *domain, unsigned long iova,
-		   phys_addr_t paddr, int gfp_order, int prot);
-	int (*unmap)(struct iommu_domain *domain, unsigned long iova,
-		     int gfp_order);
+		   phys_addr_t paddr, size_t size, int prot);
+	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
+		     size_t size);
 	phys_addr_t (*iova_to_phys)(struct iommu_domain *domain,
 				    unsigned long iova);
 	int (*domain_has_cap)(struct iommu_domain *domain,
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
  2011-10-17 11:27 ` Ohad Ben-Cohen
  (?)
@ 2011-10-17 11:27   ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 81+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-17 11:27 UTC (permalink / raw)
  To: iommu
  Cc: linux-omap, Laurent Pinchart, Joerg Roedel, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho, Ohad Ben-Cohen,
	kvm

When mapping a memory region, split it to page sizes as supported
by the iommu hardware. Always prefer bigger pages, when possible,
in order to reduce the TLB pressure.

The logic to do that is now added to the IOMMU core, so neither the iommu
drivers themselves nor users of the IOMMU API have to duplicate it.

This allows a more lenient granularity of mappings; traditionally the
IOMMU API took 'order' (of a page) as a mapping size, and directly let
the low level iommu drivers handle the mapping, but now that the IOMMU
core can split arbitrary memory regions into pages, we can remove this
limitation, so users don't have to split those regions by themselves.

Currently the supported page sizes are advertised once and they then
remain static. That works well for OMAP and MSM but it would probably
not fly well with intel's hardware, where the page size capabilities
seem to have the potential to be different between several DMA
remapping devices.

register_iommu() currently sets a default pgsize behavior, so we can convert
the IOMMU drivers in subsequent patches. After all the drivers
are converted, the temporary default settings will be removed.

Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted
to deal with bytes instead of page order.

Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review!

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: David Brown <davidb@codeaurora.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <Joerg.Roedel@amd.com>
Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
Cc: KyongHo Cho <pullip.cho@samsung.com>
Cc: Hiroshi DOYU <hdoyu@nvidia.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: kvm@vger.kernel.org
---
 drivers/iommu/iommu.c      |  131 +++++++++++++++++++++++++++++++++++++++-----
 drivers/iommu/omap-iovmm.c |   17 ++----
 include/linux/iommu.h      |   20 ++++++-
 virt/kvm/iommu.c           |    8 +-
 4 files changed, 144 insertions(+), 32 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 5d042e8..91df958 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -16,6 +16,8 @@
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
  */
 
+#define pr_fmt(fmt)    "%s: " fmt, __func__
+
 #include <linux/device.h>
 #include <linux/kernel.h>
 #include <linux/bug.h>
@@ -47,6 +49,16 @@ int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops)
 	if (bus->iommu_ops != NULL)
 		return -EBUSY;
 
+	/*
+	 * Set the default pgsize values, which retain the existing
+	 * IOMMU API behavior: drivers will be called to map
+	 * regions that are sized/aligned to order of 4KiB pages.
+	 *
+	 * This will be removed once all drivers are migrated.
+	 */
+	if (!ops->pgsize_bitmap)
+		ops->pgsize_bitmap = ~0xFFFUL;
+
 	bus->iommu_ops = ops;
 
 	/* Do IOMMU specific setup for this bus-type */
@@ -157,35 +169,126 @@ int iommu_domain_has_cap(struct iommu_domain *domain,
 EXPORT_SYMBOL_GPL(iommu_domain_has_cap);
 
 int iommu_map(struct iommu_domain *domain, unsigned long iova,
-	      phys_addr_t paddr, int gfp_order, int prot)
+	      phys_addr_t paddr, size_t size, int prot)
 {
-	size_t size;
+	unsigned long orig_iova = iova;
+	unsigned int min_pagesz;
+	size_t orig_size = size;
+	int ret = 0;
 
 	if (unlikely(domain->ops->map == NULL))
 		return -ENODEV;
 
-	size         = PAGE_SIZE << gfp_order;
+	/* find out the minimum page size supported */
+	min_pagesz = 1 << __ffs(domain->ops->pgsize_bitmap);
+
+	/*
+	 * both the virtual address and the physical one, as well as
+	 * the size of the mapping, must be aligned (at least) to the
+	 * size of the smallest page supported by the hardware
+	 */
+	if (!IS_ALIGNED(iova | paddr | size, min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%lx min_pagesz "
+			"0x%x\n", iova, (unsigned long)paddr,
+			(unsigned long)size, min_pagesz);
+		return -EINVAL;
+	}
+
+	pr_debug("map: iova 0x%lx pa 0x%lx size 0x%lx\n", iova,
+				(unsigned long)paddr, (unsigned long)size);
+
+	while (size) {
+		unsigned long pgsize, addr_merge = iova | paddr;
+		unsigned int pgsize_idx;
+
+		/* Max page size that still fits into 'size' */
+		pgsize_idx = __fls(size);
+
+		/* need to consider alignment requirements ? */
+		if (likely(addr_merge)) {
+			/* Max page size allowed by both iova and paddr */
+			unsigned int align_pgsize_idx = __ffs(addr_merge);
+
+			pgsize_idx = min(pgsize_idx, align_pgsize_idx);
+		}
+
+		/* build a mask of acceptable page sizes */
+		pgsize = (1UL << (pgsize_idx + 1)) - 1;
+
+		/* throw away page sizes not supported by the hardware */
+		pgsize &= domain->ops->pgsize_bitmap;
 
-	BUG_ON(!IS_ALIGNED(iova | paddr, size));
+		/* make sure we're still sane */
+		BUG_ON(!pgsize);
 
-	return domain->ops->map(domain, iova, paddr, size, prot);
+		/* pick the biggest page */
+		pgsize_idx = __fls(pgsize);
+		pgsize = 1UL << pgsize_idx;
+
+		pr_debug("mapping: iova 0x%lx pa 0x%lx pgsize %lu\n", iova,
+					(unsigned long)paddr, pgsize);
+
+		ret = domain->ops->map(domain, iova, paddr, pgsize, prot);
+		if (ret)
+			break;
+
+		iova += pgsize;
+		paddr += pgsize;
+		size -= pgsize;
+	}
+
+	/* unroll mapping in case something went wrong */
+	if (ret)
+		iommu_unmap(domain, orig_iova, orig_size - size);
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_map);
 
-int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
+size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
 {
-	size_t size, unmapped;
+	size_t unmapped_page, unmapped = 0;
+	unsigned int min_pagesz;
 
 	if (unlikely(domain->ops->unmap == NULL))
 		return -ENODEV;
 
-	size         = PAGE_SIZE << gfp_order;
-
-	BUG_ON(!IS_ALIGNED(iova, size));
-
-	unmapped = domain->ops->unmap(domain, iova, size);
-
-	return get_order(unmapped);
+	/* find out the minimum page size supported */
+	min_pagesz = 1 << __ffs(domain->ops->pgsize_bitmap);
+
+	/*
+	 * The virtual address, as well as the size of the mapping, must be
+	 * aligned (at least) to the size of the smallest page supported
+	 * by the hardware
+	 */
+	if (!IS_ALIGNED(iova | size, min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx size 0x%lx min_pagesz 0x%x\n",
+					iova, (unsigned long)size, min_pagesz);
+		return -EINVAL;
+	}
+
+	pr_debug("unmap this: iova 0x%lx size 0x%lx\n", iova,
+							(unsigned long)size);
+
+	/*
+	 * Keep iterating until we either unmap 'size' bytes (or more)
+	 * or we hit an area that isn't mapped.
+	 */
+	while (unmapped < size) {
+		size_t left = size - unmapped;
+
+		unmapped_page = domain->ops->unmap(domain, iova, left);
+		if (!unmapped_page)
+			break;
+
+		pr_debug("unmapped: iova 0x%lx size %lx\n", iova,
+					(unsigned long)unmapped_page);
+
+		iova += unmapped_page;
+		unmapped += unmapped_page;
+	}
+
+	return unmapped;
 }
 EXPORT_SYMBOL_GPL(iommu_unmap);
 
diff --git a/drivers/iommu/omap-iovmm.c b/drivers/iommu/omap-iovmm.c
index e8fdb88..0b7b14c 100644
--- a/drivers/iommu/omap-iovmm.c
+++ b/drivers/iommu/omap-iovmm.c
@@ -409,7 +409,6 @@ static int map_iovm_area(struct iommu_domain *domain, struct iovm_struct *new,
 	unsigned int i, j;
 	struct scatterlist *sg;
 	u32 da = new->da_start;
-	int order;
 
 	if (!domain || !sgt)
 		return -EINVAL;
@@ -428,12 +427,10 @@ static int map_iovm_area(struct iommu_domain *domain, struct iovm_struct *new,
 		if (bytes_to_iopgsz(bytes) < 0)
 			goto err_out;
 
-		order = get_order(bytes);
-
 		pr_debug("%s: [%d] %08x %08x(%x)\n", __func__,
 			 i, da, pa, bytes);
 
-		err = iommu_map(domain, da, pa, order, flags);
+		err = iommu_map(domain, da, pa, bytes, flags);
 		if (err)
 			goto err_out;
 
@@ -448,10 +445,9 @@ err_out:
 		size_t bytes;
 
 		bytes = sg->length + sg->offset;
-		order = get_order(bytes);
 
 		/* ignore failures.. we're already handling one */
-		iommu_unmap(domain, da, order);
+		iommu_unmap(domain, da, bytes);
 
 		da += bytes;
 	}
@@ -466,7 +462,8 @@ static void unmap_iovm_area(struct iommu_domain *domain, struct omap_iommu *obj,
 	size_t total = area->da_end - area->da_start;
 	const struct sg_table *sgt = area->sgt;
 	struct scatterlist *sg;
-	int i, err;
+	int i;
+	size_t unmapped;
 
 	BUG_ON(!sgtable_ok(sgt));
 	BUG_ON((!total) || !IS_ALIGNED(total, PAGE_SIZE));
@@ -474,13 +471,11 @@ static void unmap_iovm_area(struct iommu_domain *domain, struct omap_iommu *obj,
 	start = area->da_start;
 	for_each_sg(sgt->sgl, sg, sgt->nents, i) {
 		size_t bytes;
-		int order;
 
 		bytes = sg->length + sg->offset;
-		order = get_order(bytes);
 
-		err = iommu_unmap(domain, start, order);
-		if (err < 0)
+		unmapped = iommu_unmap(domain, start, bytes);
+		if (unmapped < bytes)
 			break;
 
 		dev_dbg(obj->dev, "%s: unmap %08x(%x) %08x\n",
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 6b6ed21..1f0f992 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -48,6 +48,19 @@ struct iommu_domain {
 
 #ifdef CONFIG_IOMMU_API
 
+/**
+ * struct iommu_ops - iommu ops and capabilities
+ * @domain_init: init iommu domain
+ * @domain_destroy: destroy iommu domain
+ * @attach_dev: attach device to an iommu domain
+ * @detach_dev: detach device from an iommu domain
+ * @map: map a physically contiguous memory region to an iommu domain
+ * @unmap: unmap a physically contiguous memory region from an iommu domain
+ * @iova_to_phys: translate iova to physical address
+ * @domain_has_cap: domain capabilities query
+ * @commit: commit iommu domain
+ * @pgsize_bitmap: bitmap of supported page sizes
+ */
 struct iommu_ops {
 	int (*domain_init)(struct iommu_domain *domain);
 	void (*domain_destroy)(struct iommu_domain *domain);
@@ -62,6 +75,7 @@ struct iommu_ops {
 	int (*domain_has_cap)(struct iommu_domain *domain,
 			      unsigned long cap);
 	void (*commit)(struct iommu_domain *domain);
+	unsigned long pgsize_bitmap;
 };
 
 extern int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops);
@@ -73,9 +87,9 @@ extern int iommu_attach_device(struct iommu_domain *domain,
 extern void iommu_detach_device(struct iommu_domain *domain,
 				struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
-		     phys_addr_t paddr, int gfp_order, int prot);
-extern int iommu_unmap(struct iommu_domain *domain, unsigned long iova,
-		       int gfp_order);
+		     phys_addr_t paddr, size_t size, int prot);
+extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
+		       size_t size);
 extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
 				      unsigned long iova);
 extern int iommu_domain_has_cap(struct iommu_domain *domain,
diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index ca409be..35ed096 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -111,7 +111,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot *slot)
 
 		/* Map into IO address space */
 		r = iommu_map(domain, gfn_to_gpa(gfn), pfn_to_hpa(pfn),
-			      get_order(page_size), flags);
+			      page_size, flags);
 		if (r) {
 			printk(KERN_ERR "kvm_iommu_map_address:"
 			       "iommu failed to map pfn=%llx\n", pfn);
@@ -292,15 +292,15 @@ static void kvm_iommu_put_pages(struct kvm *kvm,
 
 	while (gfn < end_gfn) {
 		unsigned long unmap_pages;
-		int order;
+		size_t size;
 
 		/* Get physical address */
 		phys = iommu_iova_to_phys(domain, gfn_to_gpa(gfn));
 		pfn  = phys >> PAGE_SHIFT;
 
 		/* Unmap address from IO address space */
-		order       = iommu_unmap(domain, gfn_to_gpa(gfn), 0);
-		unmap_pages = 1ULL << order;
+		size       = iommu_unmap(domain, gfn_to_gpa(gfn), PAGE_SIZE);
+		unmap_pages = 1ULL << get_order(size);
 
 		/* Unpin all pages we just unmapped to not leak any memory */
 		kvm_unpin_pages(kvm, pfn, unmap_pages);
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-17 11:27   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 81+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-17 11:27 UTC (permalink / raw)
  To: iommu
  Cc: linux-omap, Laurent Pinchart, Joerg Roedel, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho, Ohad Ben-Cohen,
	kvm

When mapping a memory region, split it to page sizes as supported
by the iommu hardware. Always prefer bigger pages, when possible,
in order to reduce the TLB pressure.

The logic to do that is now added to the IOMMU core, so neither the iommu
drivers themselves nor users of the IOMMU API have to duplicate it.

This allows a more lenient granularity of mappings; traditionally the
IOMMU API took 'order' (of a page) as a mapping size, and directly let
the low level iommu drivers handle the mapping, but now that the IOMMU
core can split arbitrary memory regions into pages, we can remove this
limitation, so users don't have to split those regions by themselves.

Currently the supported page sizes are advertised once and they then
remain static. That works well for OMAP and MSM but it would probably
not fly well with intel's hardware, where the page size capabilities
seem to have the potential to be different between several DMA
remapping devices.

register_iommu() currently sets a default pgsize behavior, so we can convert
the IOMMU drivers in subsequent patches. After all the drivers
are converted, the temporary default settings will be removed.

Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted
to deal with bytes instead of page order.

Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review!

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: David Brown <davidb@codeaurora.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <Joerg.Roedel@amd.com>
Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
Cc: KyongHo Cho <pullip.cho@samsung.com>
Cc: Hiroshi DOYU <hdoyu@nvidia.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: kvm@vger.kernel.org
---
 drivers/iommu/iommu.c      |  131 +++++++++++++++++++++++++++++++++++++++-----
 drivers/iommu/omap-iovmm.c |   17 ++----
 include/linux/iommu.h      |   20 ++++++-
 virt/kvm/iommu.c           |    8 +-
 4 files changed, 144 insertions(+), 32 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 5d042e8..91df958 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -16,6 +16,8 @@
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
  */
 
+#define pr_fmt(fmt)    "%s: " fmt, __func__
+
 #include <linux/device.h>
 #include <linux/kernel.h>
 #include <linux/bug.h>
@@ -47,6 +49,16 @@ int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops)
 	if (bus->iommu_ops != NULL)
 		return -EBUSY;
 
+	/*
+	 * Set the default pgsize values, which retain the existing
+	 * IOMMU API behavior: drivers will be called to map
+	 * regions that are sized/aligned to order of 4KiB pages.
+	 *
+	 * This will be removed once all drivers are migrated.
+	 */
+	if (!ops->pgsize_bitmap)
+		ops->pgsize_bitmap = ~0xFFFUL;
+
 	bus->iommu_ops = ops;
 
 	/* Do IOMMU specific setup for this bus-type */
@@ -157,35 +169,126 @@ int iommu_domain_has_cap(struct iommu_domain *domain,
 EXPORT_SYMBOL_GPL(iommu_domain_has_cap);
 
 int iommu_map(struct iommu_domain *domain, unsigned long iova,
-	      phys_addr_t paddr, int gfp_order, int prot)
+	      phys_addr_t paddr, size_t size, int prot)
 {
-	size_t size;
+	unsigned long orig_iova = iova;
+	unsigned int min_pagesz;
+	size_t orig_size = size;
+	int ret = 0;
 
 	if (unlikely(domain->ops->map == NULL))
 		return -ENODEV;
 
-	size         = PAGE_SIZE << gfp_order;
+	/* find out the minimum page size supported */
+	min_pagesz = 1 << __ffs(domain->ops->pgsize_bitmap);
+
+	/*
+	 * both the virtual address and the physical one, as well as
+	 * the size of the mapping, must be aligned (at least) to the
+	 * size of the smallest page supported by the hardware
+	 */
+	if (!IS_ALIGNED(iova | paddr | size, min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%lx min_pagesz "
+			"0x%x\n", iova, (unsigned long)paddr,
+			(unsigned long)size, min_pagesz);
+		return -EINVAL;
+	}
+
+	pr_debug("map: iova 0x%lx pa 0x%lx size 0x%lx\n", iova,
+				(unsigned long)paddr, (unsigned long)size);
+
+	while (size) {
+		unsigned long pgsize, addr_merge = iova | paddr;
+		unsigned int pgsize_idx;
+
+		/* Max page size that still fits into 'size' */
+		pgsize_idx = __fls(size);
+
+		/* need to consider alignment requirements ? */
+		if (likely(addr_merge)) {
+			/* Max page size allowed by both iova and paddr */
+			unsigned int align_pgsize_idx = __ffs(addr_merge);
+
+			pgsize_idx = min(pgsize_idx, align_pgsize_idx);
+		}
+
+		/* build a mask of acceptable page sizes */
+		pgsize = (1UL << (pgsize_idx + 1)) - 1;
+
+		/* throw away page sizes not supported by the hardware */
+		pgsize &= domain->ops->pgsize_bitmap;
 
-	BUG_ON(!IS_ALIGNED(iova | paddr, size));
+		/* make sure we're still sane */
+		BUG_ON(!pgsize);
 
-	return domain->ops->map(domain, iova, paddr, size, prot);
+		/* pick the biggest page */
+		pgsize_idx = __fls(pgsize);
+		pgsize = 1UL << pgsize_idx;
+
+		pr_debug("mapping: iova 0x%lx pa 0x%lx pgsize %lu\n", iova,
+					(unsigned long)paddr, pgsize);
+
+		ret = domain->ops->map(domain, iova, paddr, pgsize, prot);
+		if (ret)
+			break;
+
+		iova += pgsize;
+		paddr += pgsize;
+		size -= pgsize;
+	}
+
+	/* unroll mapping in case something went wrong */
+	if (ret)
+		iommu_unmap(domain, orig_iova, orig_size - size);
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_map);
 
-int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
+size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
 {
-	size_t size, unmapped;
+	size_t unmapped_page, unmapped = 0;
+	unsigned int min_pagesz;
 
 	if (unlikely(domain->ops->unmap == NULL))
 		return -ENODEV;
 
-	size         = PAGE_SIZE << gfp_order;
-
-	BUG_ON(!IS_ALIGNED(iova, size));
-
-	unmapped = domain->ops->unmap(domain, iova, size);
-
-	return get_order(unmapped);
+	/* find out the minimum page size supported */
+	min_pagesz = 1 << __ffs(domain->ops->pgsize_bitmap);
+
+	/*
+	 * The virtual address, as well as the size of the mapping, must be
+	 * aligned (at least) to the size of the smallest page supported
+	 * by the hardware
+	 */
+	if (!IS_ALIGNED(iova | size, min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx size 0x%lx min_pagesz 0x%x\n",
+					iova, (unsigned long)size, min_pagesz);
+		return -EINVAL;
+	}
+
+	pr_debug("unmap this: iova 0x%lx size 0x%lx\n", iova,
+							(unsigned long)size);
+
+	/*
+	 * Keep iterating until we either unmap 'size' bytes (or more)
+	 * or we hit an area that isn't mapped.
+	 */
+	while (unmapped < size) {
+		size_t left = size - unmapped;
+
+		unmapped_page = domain->ops->unmap(domain, iova, left);
+		if (!unmapped_page)
+			break;
+
+		pr_debug("unmapped: iova 0x%lx size %lx\n", iova,
+					(unsigned long)unmapped_page);
+
+		iova += unmapped_page;
+		unmapped += unmapped_page;
+	}
+
+	return unmapped;
 }
 EXPORT_SYMBOL_GPL(iommu_unmap);
 
diff --git a/drivers/iommu/omap-iovmm.c b/drivers/iommu/omap-iovmm.c
index e8fdb88..0b7b14c 100644
--- a/drivers/iommu/omap-iovmm.c
+++ b/drivers/iommu/omap-iovmm.c
@@ -409,7 +409,6 @@ static int map_iovm_area(struct iommu_domain *domain, struct iovm_struct *new,
 	unsigned int i, j;
 	struct scatterlist *sg;
 	u32 da = new->da_start;
-	int order;
 
 	if (!domain || !sgt)
 		return -EINVAL;
@@ -428,12 +427,10 @@ static int map_iovm_area(struct iommu_domain *domain, struct iovm_struct *new,
 		if (bytes_to_iopgsz(bytes) < 0)
 			goto err_out;
 
-		order = get_order(bytes);
-
 		pr_debug("%s: [%d] %08x %08x(%x)\n", __func__,
 			 i, da, pa, bytes);
 
-		err = iommu_map(domain, da, pa, order, flags);
+		err = iommu_map(domain, da, pa, bytes, flags);
 		if (err)
 			goto err_out;
 
@@ -448,10 +445,9 @@ err_out:
 		size_t bytes;
 
 		bytes = sg->length + sg->offset;
-		order = get_order(bytes);
 
 		/* ignore failures.. we're already handling one */
-		iommu_unmap(domain, da, order);
+		iommu_unmap(domain, da, bytes);
 
 		da += bytes;
 	}
@@ -466,7 +462,8 @@ static void unmap_iovm_area(struct iommu_domain *domain, struct omap_iommu *obj,
 	size_t total = area->da_end - area->da_start;
 	const struct sg_table *sgt = area->sgt;
 	struct scatterlist *sg;
-	int i, err;
+	int i;
+	size_t unmapped;
 
 	BUG_ON(!sgtable_ok(sgt));
 	BUG_ON((!total) || !IS_ALIGNED(total, PAGE_SIZE));
@@ -474,13 +471,11 @@ static void unmap_iovm_area(struct iommu_domain *domain, struct omap_iommu *obj,
 	start = area->da_start;
 	for_each_sg(sgt->sgl, sg, sgt->nents, i) {
 		size_t bytes;
-		int order;
 
 		bytes = sg->length + sg->offset;
-		order = get_order(bytes);
 
-		err = iommu_unmap(domain, start, order);
-		if (err < 0)
+		unmapped = iommu_unmap(domain, start, bytes);
+		if (unmapped < bytes)
 			break;
 
 		dev_dbg(obj->dev, "%s: unmap %08x(%x) %08x\n",
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 6b6ed21..1f0f992 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -48,6 +48,19 @@ struct iommu_domain {
 
 #ifdef CONFIG_IOMMU_API
 
+/**
+ * struct iommu_ops - iommu ops and capabilities
+ * @domain_init: init iommu domain
+ * @domain_destroy: destroy iommu domain
+ * @attach_dev: attach device to an iommu domain
+ * @detach_dev: detach device from an iommu domain
+ * @map: map a physically contiguous memory region to an iommu domain
+ * @unmap: unmap a physically contiguous memory region from an iommu domain
+ * @iova_to_phys: translate iova to physical address
+ * @domain_has_cap: domain capabilities query
+ * @commit: commit iommu domain
+ * @pgsize_bitmap: bitmap of supported page sizes
+ */
 struct iommu_ops {
 	int (*domain_init)(struct iommu_domain *domain);
 	void (*domain_destroy)(struct iommu_domain *domain);
@@ -62,6 +75,7 @@ struct iommu_ops {
 	int (*domain_has_cap)(struct iommu_domain *domain,
 			      unsigned long cap);
 	void (*commit)(struct iommu_domain *domain);
+	unsigned long pgsize_bitmap;
 };
 
 extern int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops);
@@ -73,9 +87,9 @@ extern int iommu_attach_device(struct iommu_domain *domain,
 extern void iommu_detach_device(struct iommu_domain *domain,
 				struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
-		     phys_addr_t paddr, int gfp_order, int prot);
-extern int iommu_unmap(struct iommu_domain *domain, unsigned long iova,
-		       int gfp_order);
+		     phys_addr_t paddr, size_t size, int prot);
+extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
+		       size_t size);
 extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
 				      unsigned long iova);
 extern int iommu_domain_has_cap(struct iommu_domain *domain,
diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index ca409be..35ed096 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -111,7 +111,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot *slot)
 
 		/* Map into IO address space */
 		r = iommu_map(domain, gfn_to_gpa(gfn), pfn_to_hpa(pfn),
-			      get_order(page_size), flags);
+			      page_size, flags);
 		if (r) {
 			printk(KERN_ERR "kvm_iommu_map_address:"
 			       "iommu failed to map pfn=%llx\n", pfn);
@@ -292,15 +292,15 @@ static void kvm_iommu_put_pages(struct kvm *kvm,
 
 	while (gfn < end_gfn) {
 		unsigned long unmap_pages;
-		int order;
+		size_t size;
 
 		/* Get physical address */
 		phys = iommu_iova_to_phys(domain, gfn_to_gpa(gfn));
 		pfn  = phys >> PAGE_SHIFT;
 
 		/* Unmap address from IO address space */
-		order       = iommu_unmap(domain, gfn_to_gpa(gfn), 0);
-		unmap_pages = 1ULL << order;
+		size       = iommu_unmap(domain, gfn_to_gpa(gfn), PAGE_SIZE);
+		unmap_pages = 1ULL << get_order(size);
 
 		/* Unpin all pages we just unmapped to not leak any memory */
 		kvm_unpin_pages(kvm, pfn, unmap_pages);
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-10-17 11:27   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 81+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-17 11:27 UTC (permalink / raw)
  To: linux-arm-kernel

When mapping a memory region, split it to page sizes as supported
by the iommu hardware. Always prefer bigger pages, when possible,
in order to reduce the TLB pressure.

The logic to do that is now added to the IOMMU core, so neither the iommu
drivers themselves nor users of the IOMMU API have to duplicate it.

This allows a more lenient granularity of mappings; traditionally the
IOMMU API took 'order' (of a page) as a mapping size, and directly let
the low level iommu drivers handle the mapping, but now that the IOMMU
core can split arbitrary memory regions into pages, we can remove this
limitation, so users don't have to split those regions by themselves.

Currently the supported page sizes are advertised once and they then
remain static. That works well for OMAP and MSM but it would probably
not fly well with intel's hardware, where the page size capabilities
seem to have the potential to be different between several DMA
remapping devices.

register_iommu() currently sets a default pgsize behavior, so we can convert
the IOMMU drivers in subsequent patches. After all the drivers
are converted, the temporary default settings will be removed.

Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted
to deal with bytes instead of page order.

Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review!

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: David Brown <davidb@codeaurora.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <Joerg.Roedel@amd.com>
Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
Cc: KyongHo Cho <pullip.cho@samsung.com>
Cc: Hiroshi DOYU <hdoyu@nvidia.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: kvm at vger.kernel.org
---
 drivers/iommu/iommu.c      |  131 +++++++++++++++++++++++++++++++++++++++-----
 drivers/iommu/omap-iovmm.c |   17 ++----
 include/linux/iommu.h      |   20 ++++++-
 virt/kvm/iommu.c           |    8 +-
 4 files changed, 144 insertions(+), 32 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 5d042e8..91df958 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -16,6 +16,8 @@
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
  */
 
+#define pr_fmt(fmt)    "%s: " fmt, __func__
+
 #include <linux/device.h>
 #include <linux/kernel.h>
 #include <linux/bug.h>
@@ -47,6 +49,16 @@ int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops)
 	if (bus->iommu_ops != NULL)
 		return -EBUSY;
 
+	/*
+	 * Set the default pgsize values, which retain the existing
+	 * IOMMU API behavior: drivers will be called to map
+	 * regions that are sized/aligned to order of 4KiB pages.
+	 *
+	 * This will be removed once all drivers are migrated.
+	 */
+	if (!ops->pgsize_bitmap)
+		ops->pgsize_bitmap = ~0xFFFUL;
+
 	bus->iommu_ops = ops;
 
 	/* Do IOMMU specific setup for this bus-type */
@@ -157,35 +169,126 @@ int iommu_domain_has_cap(struct iommu_domain *domain,
 EXPORT_SYMBOL_GPL(iommu_domain_has_cap);
 
 int iommu_map(struct iommu_domain *domain, unsigned long iova,
-	      phys_addr_t paddr, int gfp_order, int prot)
+	      phys_addr_t paddr, size_t size, int prot)
 {
-	size_t size;
+	unsigned long orig_iova = iova;
+	unsigned int min_pagesz;
+	size_t orig_size = size;
+	int ret = 0;
 
 	if (unlikely(domain->ops->map == NULL))
 		return -ENODEV;
 
-	size         = PAGE_SIZE << gfp_order;
+	/* find out the minimum page size supported */
+	min_pagesz = 1 << __ffs(domain->ops->pgsize_bitmap);
+
+	/*
+	 * both the virtual address and the physical one, as well as
+	 * the size of the mapping, must be aligned (at least) to the
+	 * size of the smallest page supported by the hardware
+	 */
+	if (!IS_ALIGNED(iova | paddr | size, min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%lx min_pagesz "
+			"0x%x\n", iova, (unsigned long)paddr,
+			(unsigned long)size, min_pagesz);
+		return -EINVAL;
+	}
+
+	pr_debug("map: iova 0x%lx pa 0x%lx size 0x%lx\n", iova,
+				(unsigned long)paddr, (unsigned long)size);
+
+	while (size) {
+		unsigned long pgsize, addr_merge = iova | paddr;
+		unsigned int pgsize_idx;
+
+		/* Max page size that still fits into 'size' */
+		pgsize_idx = __fls(size);
+
+		/* need to consider alignment requirements ? */
+		if (likely(addr_merge)) {
+			/* Max page size allowed by both iova and paddr */
+			unsigned int align_pgsize_idx = __ffs(addr_merge);
+
+			pgsize_idx = min(pgsize_idx, align_pgsize_idx);
+		}
+
+		/* build a mask of acceptable page sizes */
+		pgsize = (1UL << (pgsize_idx + 1)) - 1;
+
+		/* throw away page sizes not supported by the hardware */
+		pgsize &= domain->ops->pgsize_bitmap;
 
-	BUG_ON(!IS_ALIGNED(iova | paddr, size));
+		/* make sure we're still sane */
+		BUG_ON(!pgsize);
 
-	return domain->ops->map(domain, iova, paddr, size, prot);
+		/* pick the biggest page */
+		pgsize_idx = __fls(pgsize);
+		pgsize = 1UL << pgsize_idx;
+
+		pr_debug("mapping: iova 0x%lx pa 0x%lx pgsize %lu\n", iova,
+					(unsigned long)paddr, pgsize);
+
+		ret = domain->ops->map(domain, iova, paddr, pgsize, prot);
+		if (ret)
+			break;
+
+		iova += pgsize;
+		paddr += pgsize;
+		size -= pgsize;
+	}
+
+	/* unroll mapping in case something went wrong */
+	if (ret)
+		iommu_unmap(domain, orig_iova, orig_size - size);
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_map);
 
-int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
+size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
 {
-	size_t size, unmapped;
+	size_t unmapped_page, unmapped = 0;
+	unsigned int min_pagesz;
 
 	if (unlikely(domain->ops->unmap == NULL))
 		return -ENODEV;
 
-	size         = PAGE_SIZE << gfp_order;
-
-	BUG_ON(!IS_ALIGNED(iova, size));
-
-	unmapped = domain->ops->unmap(domain, iova, size);
-
-	return get_order(unmapped);
+	/* find out the minimum page size supported */
+	min_pagesz = 1 << __ffs(domain->ops->pgsize_bitmap);
+
+	/*
+	 * The virtual address, as well as the size of the mapping, must be
+	 * aligned (at least) to the size of the smallest page supported
+	 * by the hardware
+	 */
+	if (!IS_ALIGNED(iova | size, min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx size 0x%lx min_pagesz 0x%x\n",
+					iova, (unsigned long)size, min_pagesz);
+		return -EINVAL;
+	}
+
+	pr_debug("unmap this: iova 0x%lx size 0x%lx\n", iova,
+							(unsigned long)size);
+
+	/*
+	 * Keep iterating until we either unmap 'size' bytes (or more)
+	 * or we hit an area that isn't mapped.
+	 */
+	while (unmapped < size) {
+		size_t left = size - unmapped;
+
+		unmapped_page = domain->ops->unmap(domain, iova, left);
+		if (!unmapped_page)
+			break;
+
+		pr_debug("unmapped: iova 0x%lx size %lx\n", iova,
+					(unsigned long)unmapped_page);
+
+		iova += unmapped_page;
+		unmapped += unmapped_page;
+	}
+
+	return unmapped;
 }
 EXPORT_SYMBOL_GPL(iommu_unmap);
 
diff --git a/drivers/iommu/omap-iovmm.c b/drivers/iommu/omap-iovmm.c
index e8fdb88..0b7b14c 100644
--- a/drivers/iommu/omap-iovmm.c
+++ b/drivers/iommu/omap-iovmm.c
@@ -409,7 +409,6 @@ static int map_iovm_area(struct iommu_domain *domain, struct iovm_struct *new,
 	unsigned int i, j;
 	struct scatterlist *sg;
 	u32 da = new->da_start;
-	int order;
 
 	if (!domain || !sgt)
 		return -EINVAL;
@@ -428,12 +427,10 @@ static int map_iovm_area(struct iommu_domain *domain, struct iovm_struct *new,
 		if (bytes_to_iopgsz(bytes) < 0)
 			goto err_out;
 
-		order = get_order(bytes);
-
 		pr_debug("%s: [%d] %08x %08x(%x)\n", __func__,
 			 i, da, pa, bytes);
 
-		err = iommu_map(domain, da, pa, order, flags);
+		err = iommu_map(domain, da, pa, bytes, flags);
 		if (err)
 			goto err_out;
 
@@ -448,10 +445,9 @@ err_out:
 		size_t bytes;
 
 		bytes = sg->length + sg->offset;
-		order = get_order(bytes);
 
 		/* ignore failures.. we're already handling one */
-		iommu_unmap(domain, da, order);
+		iommu_unmap(domain, da, bytes);
 
 		da += bytes;
 	}
@@ -466,7 +462,8 @@ static void unmap_iovm_area(struct iommu_domain *domain, struct omap_iommu *obj,
 	size_t total = area->da_end - area->da_start;
 	const struct sg_table *sgt = area->sgt;
 	struct scatterlist *sg;
-	int i, err;
+	int i;
+	size_t unmapped;
 
 	BUG_ON(!sgtable_ok(sgt));
 	BUG_ON((!total) || !IS_ALIGNED(total, PAGE_SIZE));
@@ -474,13 +471,11 @@ static void unmap_iovm_area(struct iommu_domain *domain, struct omap_iommu *obj,
 	start = area->da_start;
 	for_each_sg(sgt->sgl, sg, sgt->nents, i) {
 		size_t bytes;
-		int order;
 
 		bytes = sg->length + sg->offset;
-		order = get_order(bytes);
 
-		err = iommu_unmap(domain, start, order);
-		if (err < 0)
+		unmapped = iommu_unmap(domain, start, bytes);
+		if (unmapped < bytes)
 			break;
 
 		dev_dbg(obj->dev, "%s: unmap %08x(%x) %08x\n",
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 6b6ed21..1f0f992 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -48,6 +48,19 @@ struct iommu_domain {
 
 #ifdef CONFIG_IOMMU_API
 
+/**
+ * struct iommu_ops - iommu ops and capabilities
+ * @domain_init: init iommu domain
+ * @domain_destroy: destroy iommu domain
+ * @attach_dev: attach device to an iommu domain
+ * @detach_dev: detach device from an iommu domain
+ * @map: map a physically contiguous memory region to an iommu domain
+ * @unmap: unmap a physically contiguous memory region from an iommu domain
+ * @iova_to_phys: translate iova to physical address
+ * @domain_has_cap: domain capabilities query
+ * @commit: commit iommu domain
+ * @pgsize_bitmap: bitmap of supported page sizes
+ */
 struct iommu_ops {
 	int (*domain_init)(struct iommu_domain *domain);
 	void (*domain_destroy)(struct iommu_domain *domain);
@@ -62,6 +75,7 @@ struct iommu_ops {
 	int (*domain_has_cap)(struct iommu_domain *domain,
 			      unsigned long cap);
 	void (*commit)(struct iommu_domain *domain);
+	unsigned long pgsize_bitmap;
 };
 
 extern int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops);
@@ -73,9 +87,9 @@ extern int iommu_attach_device(struct iommu_domain *domain,
 extern void iommu_detach_device(struct iommu_domain *domain,
 				struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
-		     phys_addr_t paddr, int gfp_order, int prot);
-extern int iommu_unmap(struct iommu_domain *domain, unsigned long iova,
-		       int gfp_order);
+		     phys_addr_t paddr, size_t size, int prot);
+extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
+		       size_t size);
 extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
 				      unsigned long iova);
 extern int iommu_domain_has_cap(struct iommu_domain *domain,
diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index ca409be..35ed096 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -111,7 +111,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot *slot)
 
 		/* Map into IO address space */
 		r = iommu_map(domain, gfn_to_gpa(gfn), pfn_to_hpa(pfn),
-			      get_order(page_size), flags);
+			      page_size, flags);
 		if (r) {
 			printk(KERN_ERR "kvm_iommu_map_address:"
 			       "iommu failed to map pfn=%llx\n", pfn);
@@ -292,15 +292,15 @@ static void kvm_iommu_put_pages(struct kvm *kvm,
 
 	while (gfn < end_gfn) {
 		unsigned long unmap_pages;
-		int order;
+		size_t size;
 
 		/* Get physical address */
 		phys = iommu_iova_to_phys(domain, gfn_to_gpa(gfn));
 		pfn  = phys >> PAGE_SHIFT;
 
 		/* Unmap address from IO address space */
-		order       = iommu_unmap(domain, gfn_to_gpa(gfn), 0);
-		unmap_pages = 1ULL << order;
+		size       = iommu_unmap(domain, gfn_to_gpa(gfn), PAGE_SIZE);
+		unmap_pages = 1ULL << get_order(size);
 
 		/* Unpin all pages we just unmapped to not leak any memory */
 		kvm_unpin_pages(kvm, pfn, unmap_pages);
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v4 3/7] iommu/omap: announce supported page sizes
  2011-10-17 11:27 ` Ohad Ben-Cohen
  (?)
@ 2011-10-17 11:27   ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 81+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-17 11:27 UTC (permalink / raw)
  To: iommu
  Cc: linux-omap, Laurent Pinchart, Joerg Roedel, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho, Ohad Ben-Cohen

Let the IOMMU core know we support 4KiB, 64KiB, 1MiB and 16MiB page sizes.

This way the IOMMU core can split any arbitrary-sized physically
contiguous regions (that it needs to map) as needed.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Hiroshi DOYU <hdoyu@nvidia.com>
---
 drivers/iommu/omap-iommu.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index 1bb2971..1692b6a 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -33,6 +33,9 @@
 	     (__i < (n)) && (cr = __iotlb_read_cr((obj), __i), true);	\
 	     __i++)
 
+/* bitmap of the page sizes currently supported */
+#define OMAP_IOMMU_PGSIZES	(SZ_4K | SZ_64K | SZ_1M | SZ_16M)
+
 /**
  * struct omap_iommu_domain - omap iommu domain
  * @pgtable:	the page table
@@ -1207,6 +1210,7 @@ static struct iommu_ops omap_iommu_ops = {
 	.unmap		= omap_iommu_unmap,
 	.iova_to_phys	= omap_iommu_iova_to_phys,
 	.domain_has_cap	= omap_iommu_domain_has_cap,
+	.pgsize_bitmap	= OMAP_IOMMU_PGSIZES,
 };
 
 static int __init omap_iommu_init(void)
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v4 3/7] iommu/omap: announce supported page sizes
@ 2011-10-17 11:27   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 81+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-17 11:27 UTC (permalink / raw)
  To: iommu
  Cc: Ohad Ben-Cohen, KyongHo Cho, Arnd Bergmann, Joerg Roedel,
	linux-kernel, Stepan Moskovchenko, Laurent Pinchart, David Brown,
	linux-omap, David Woodhouse, linux-arm-kernel, Hiroshi Doyu

Let the IOMMU core know we support 4KiB, 64KiB, 1MiB and 16MiB page sizes.

This way the IOMMU core can split any arbitrary-sized physically
contiguous regions (that it needs to map) as needed.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Hiroshi DOYU <hdoyu@nvidia.com>
---
 drivers/iommu/omap-iommu.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index 1bb2971..1692b6a 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -33,6 +33,9 @@
 	     (__i < (n)) && (cr = __iotlb_read_cr((obj), __i), true);	\
 	     __i++)
 
+/* bitmap of the page sizes currently supported */
+#define OMAP_IOMMU_PGSIZES	(SZ_4K | SZ_64K | SZ_1M | SZ_16M)
+
 /**
  * struct omap_iommu_domain - omap iommu domain
  * @pgtable:	the page table
@@ -1207,6 +1210,7 @@ static struct iommu_ops omap_iommu_ops = {
 	.unmap		= omap_iommu_unmap,
 	.iova_to_phys	= omap_iommu_iova_to_phys,
 	.domain_has_cap	= omap_iommu_domain_has_cap,
+	.pgsize_bitmap	= OMAP_IOMMU_PGSIZES,
 };
 
 static int __init omap_iommu_init(void)
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v4 3/7] iommu/omap: announce supported page sizes
@ 2011-10-17 11:27   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 81+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-17 11:27 UTC (permalink / raw)
  To: linux-arm-kernel

Let the IOMMU core know we support 4KiB, 64KiB, 1MiB and 16MiB page sizes.

This way the IOMMU core can split any arbitrary-sized physically
contiguous regions (that it needs to map) as needed.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Hiroshi DOYU <hdoyu@nvidia.com>
---
 drivers/iommu/omap-iommu.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index 1bb2971..1692b6a 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -33,6 +33,9 @@
 	     (__i < (n)) && (cr = __iotlb_read_cr((obj), __i), true);	\
 	     __i++)
 
+/* bitmap of the page sizes currently supported */
+#define OMAP_IOMMU_PGSIZES	(SZ_4K | SZ_64K | SZ_1M | SZ_16M)
+
 /**
  * struct omap_iommu_domain - omap iommu domain
  * @pgtable:	the page table
@@ -1207,6 +1210,7 @@ static struct iommu_ops omap_iommu_ops = {
 	.unmap		= omap_iommu_unmap,
 	.iova_to_phys	= omap_iommu_iova_to_phys,
 	.domain_has_cap	= omap_iommu_domain_has_cap,
+	.pgsize_bitmap	= OMAP_IOMMU_PGSIZES,
 };
 
 static int __init omap_iommu_init(void)
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v4 4/7] iommu/msm: announce supported page sizes
  2011-10-17 11:27 ` Ohad Ben-Cohen
  (?)
@ 2011-10-17 11:27   ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 81+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-17 11:27 UTC (permalink / raw)
  To: iommu
  Cc: linux-omap, Laurent Pinchart, Joerg Roedel, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho, Ohad Ben-Cohen

Let the IOMMU core know we support 4KiB, 64KiB, 1MiB and 16MiB page sizes.

This way the IOMMU core can split any arbitrary-sized physically
contiguous regions (that it needs to map) as needed.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Acked-by: David Brown <davidb@codeaurora.org>
Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
---
 drivers/iommu/msm_iommu.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index 13718d9..08a90b8 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -42,6 +42,9 @@ __asm__ __volatile__ (							\
 #define RCP15_PRRR(reg)		MRC(reg, p15, 0, c10, c2, 0)
 #define RCP15_NMRR(reg)		MRC(reg, p15, 0, c10, c2, 1)
 
+/* bitmap of the page sizes currently supported */
+#define MSM_IOMMU_PGSIZES	(SZ_4K | SZ_64K | SZ_1M | SZ_16M)
+
 static int msm_iommu_tex_class[4];
 
 DEFINE_SPINLOCK(msm_iommu_lock);
@@ -679,7 +682,8 @@ static struct iommu_ops msm_iommu_ops = {
 	.map = msm_iommu_map,
 	.unmap = msm_iommu_unmap,
 	.iova_to_phys = msm_iommu_iova_to_phys,
-	.domain_has_cap = msm_iommu_domain_has_cap
+	.domain_has_cap = msm_iommu_domain_has_cap,
+	.pgsize_bitmap = MSM_IOMMU_PGSIZES,
 };
 
 static int __init get_tex_class(int icp, int ocp, int mt, int nos)
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v4 4/7] iommu/msm: announce supported page sizes
@ 2011-10-17 11:27   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 81+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-17 11:27 UTC (permalink / raw)
  To: iommu
  Cc: linux-omap, Laurent Pinchart, Joerg Roedel, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho, Ohad Ben-Cohen

Let the IOMMU core know we support 4KiB, 64KiB, 1MiB and 16MiB page sizes.

This way the IOMMU core can split any arbitrary-sized physically
contiguous regions (that it needs to map) as needed.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Acked-by: David Brown <davidb@codeaurora.org>
Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
---
 drivers/iommu/msm_iommu.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index 13718d9..08a90b8 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -42,6 +42,9 @@ __asm__ __volatile__ (							\
 #define RCP15_PRRR(reg)		MRC(reg, p15, 0, c10, c2, 0)
 #define RCP15_NMRR(reg)		MRC(reg, p15, 0, c10, c2, 1)
 
+/* bitmap of the page sizes currently supported */
+#define MSM_IOMMU_PGSIZES	(SZ_4K | SZ_64K | SZ_1M | SZ_16M)
+
 static int msm_iommu_tex_class[4];
 
 DEFINE_SPINLOCK(msm_iommu_lock);
@@ -679,7 +682,8 @@ static struct iommu_ops msm_iommu_ops = {
 	.map = msm_iommu_map,
 	.unmap = msm_iommu_unmap,
 	.iova_to_phys = msm_iommu_iova_to_phys,
-	.domain_has_cap = msm_iommu_domain_has_cap
+	.domain_has_cap = msm_iommu_domain_has_cap,
+	.pgsize_bitmap = MSM_IOMMU_PGSIZES,
 };
 
 static int __init get_tex_class(int icp, int ocp, int mt, int nos)
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v4 4/7] iommu/msm: announce supported page sizes
@ 2011-10-17 11:27   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 81+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-17 11:27 UTC (permalink / raw)
  To: linux-arm-kernel

Let the IOMMU core know we support 4KiB, 64KiB, 1MiB and 16MiB page sizes.

This way the IOMMU core can split any arbitrary-sized physically
contiguous regions (that it needs to map) as needed.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Acked-by: David Brown <davidb@codeaurora.org>
Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
---
 drivers/iommu/msm_iommu.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index 13718d9..08a90b8 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -42,6 +42,9 @@ __asm__ __volatile__ (							\
 #define RCP15_PRRR(reg)		MRC(reg, p15, 0, c10, c2, 0)
 #define RCP15_NMRR(reg)		MRC(reg, p15, 0, c10, c2, 1)
 
+/* bitmap of the page sizes currently supported */
+#define MSM_IOMMU_PGSIZES	(SZ_4K | SZ_64K | SZ_1M | SZ_16M)
+
 static int msm_iommu_tex_class[4];
 
 DEFINE_SPINLOCK(msm_iommu_lock);
@@ -679,7 +682,8 @@ static struct iommu_ops msm_iommu_ops = {
 	.map = msm_iommu_map,
 	.unmap = msm_iommu_unmap,
 	.iova_to_phys = msm_iommu_iova_to_phys,
-	.domain_has_cap = msm_iommu_domain_has_cap
+	.domain_has_cap = msm_iommu_domain_has_cap,
+	.pgsize_bitmap = MSM_IOMMU_PGSIZES,
 };
 
 static int __init get_tex_class(int icp, int ocp, int mt, int nos)
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v4 5/7] iommu/amd: announce supported page sizes
  2011-10-17 11:27 ` Ohad Ben-Cohen
  (?)
@ 2011-10-17 11:27   ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 81+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-17 11:27 UTC (permalink / raw)
  To: iommu
  Cc: linux-omap, Laurent Pinchart, Joerg Roedel, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho, Ohad Ben-Cohen

Let the IOMMU core know we support arbitrary page sizes (as long as
they're an order of 4KiB).

This way the IOMMU core will retain the existing behavior we're used to;
it will let us map regions that:
- their size is an order of 4KiB
- they are naturally aligned

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Joerg Roedel <Joerg.Roedel@amd.com>
---
 drivers/iommu/amd_iommu.c |   19 +++++++++++++++++++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 32d502e..d654b59 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -41,6 +41,24 @@
 
 #define LOOP_TIMEOUT	100000
 
+/*
+ * This bitmap is used to advertise the page sizes our hardware support
+ * to the IOMMU core, which will then use this information to split
+ * physically contiguous memory regions it is mapping into page sizes
+ * that we support.
+ *
+ * Traditionally the IOMMU core just handed us the mappings directly,
+ * after making sure the size is an order of a 4KiB page and that the
+ * mapping has natural alignment.
+ *
+ * To retain this behavior, we currently advertise that we support
+ * all page sizes that are an order of 4KiB.
+ *
+ * If at some point we'd like to utilize the IOMMU core's new behavior,
+ * we could change this to advertise the real page sizes we support.
+ */
+#define AMD_IOMMU_PGSIZES	(~0xFFFUL)
+
 static DEFINE_RWLOCK(amd_iommu_devtable_lock);
 
 /* A list of preallocated protection domains */
@@ -2779,6 +2797,7 @@ static struct iommu_ops amd_iommu_ops = {
 	.unmap = amd_iommu_unmap,
 	.iova_to_phys = amd_iommu_iova_to_phys,
 	.domain_has_cap = amd_iommu_domain_has_cap,
+	.pgsize_bitmap	= AMD_IOMMU_PGSIZES,
 };
 
 /*****************************************************************************
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v4 5/7] iommu/amd: announce supported page sizes
@ 2011-10-17 11:27   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 81+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-17 11:27 UTC (permalink / raw)
  To: iommu
  Cc: linux-omap, Laurent Pinchart, Joerg Roedel, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho, Ohad Ben-Cohen

Let the IOMMU core know we support arbitrary page sizes (as long as
they're an order of 4KiB).

This way the IOMMU core will retain the existing behavior we're used to;
it will let us map regions that:
- their size is an order of 4KiB
- they are naturally aligned

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Joerg Roedel <Joerg.Roedel@amd.com>
---
 drivers/iommu/amd_iommu.c |   19 +++++++++++++++++++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 32d502e..d654b59 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -41,6 +41,24 @@
 
 #define LOOP_TIMEOUT	100000
 
+/*
+ * This bitmap is used to advertise the page sizes our hardware support
+ * to the IOMMU core, which will then use this information to split
+ * physically contiguous memory regions it is mapping into page sizes
+ * that we support.
+ *
+ * Traditionally the IOMMU core just handed us the mappings directly,
+ * after making sure the size is an order of a 4KiB page and that the
+ * mapping has natural alignment.
+ *
+ * To retain this behavior, we currently advertise that we support
+ * all page sizes that are an order of 4KiB.
+ *
+ * If at some point we'd like to utilize the IOMMU core's new behavior,
+ * we could change this to advertise the real page sizes we support.
+ */
+#define AMD_IOMMU_PGSIZES	(~0xFFFUL)
+
 static DEFINE_RWLOCK(amd_iommu_devtable_lock);
 
 /* A list of preallocated protection domains */
@@ -2779,6 +2797,7 @@ static struct iommu_ops amd_iommu_ops = {
 	.unmap = amd_iommu_unmap,
 	.iova_to_phys = amd_iommu_iova_to_phys,
 	.domain_has_cap = amd_iommu_domain_has_cap,
+	.pgsize_bitmap	= AMD_IOMMU_PGSIZES,
 };
 
 /*****************************************************************************
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v4 5/7] iommu/amd: announce supported page sizes
@ 2011-10-17 11:27   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 81+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-17 11:27 UTC (permalink / raw)
  To: linux-arm-kernel

Let the IOMMU core know we support arbitrary page sizes (as long as
they're an order of 4KiB).

This way the IOMMU core will retain the existing behavior we're used to;
it will let us map regions that:
- their size is an order of 4KiB
- they are naturally aligned

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Joerg Roedel <Joerg.Roedel@amd.com>
---
 drivers/iommu/amd_iommu.c |   19 +++++++++++++++++++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 32d502e..d654b59 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -41,6 +41,24 @@
 
 #define LOOP_TIMEOUT	100000
 
+/*
+ * This bitmap is used to advertise the page sizes our hardware support
+ * to the IOMMU core, which will then use this information to split
+ * physically contiguous memory regions it is mapping into page sizes
+ * that we support.
+ *
+ * Traditionally the IOMMU core just handed us the mappings directly,
+ * after making sure the size is an order of a 4KiB page and that the
+ * mapping has natural alignment.
+ *
+ * To retain this behavior, we currently advertise that we support
+ * all page sizes that are an order of 4KiB.
+ *
+ * If at some point we'd like to utilize the IOMMU core's new behavior,
+ * we could change this to advertise the real page sizes we support.
+ */
+#define AMD_IOMMU_PGSIZES	(~0xFFFUL)
+
 static DEFINE_RWLOCK(amd_iommu_devtable_lock);
 
 /* A list of preallocated protection domains */
@@ -2779,6 +2797,7 @@ static struct iommu_ops amd_iommu_ops = {
 	.unmap = amd_iommu_unmap,
 	.iova_to_phys = amd_iommu_iova_to_phys,
 	.domain_has_cap = amd_iommu_domain_has_cap,
+	.pgsize_bitmap	= AMD_IOMMU_PGSIZES,
 };
 
 /*****************************************************************************
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v4 6/7] iommu/intel: announce supported page sizes
  2011-10-17 11:27 ` Ohad Ben-Cohen
  (?)
@ 2011-10-17 11:27   ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 81+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-17 11:27 UTC (permalink / raw)
  To: iommu
  Cc: linux-omap, Laurent Pinchart, Joerg Roedel, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho, Ohad Ben-Cohen

Let the IOMMU core know we support arbitrary page sizes (as long as
they're an order of 4KiB).

This way the IOMMU core will retain the existing behavior we're used to;
it will let us map regions that:
- their size is an order of 4KiB
- they are naturally aligned

Note: Intel IOMMU hardware doesn't support arbitrary page sizes,
but the driver does (it splits arbitrary-sized mappings into
the pages supported by the hardware).

To make everything simpler for now, though, this patch effectively tells
the IOMMU core to keep giving this driver the same memory regions it did
before, so nothing is changed as far as it's concerned.

At this point, the page sizes announced remain static within the IOMMU
core. To correctly utilize the pgsize-splitting of the IOMMU core by
this driver, it seems that some core changes should still be done,
because Intel's IOMMU page size capabilities seem to have the potential
to be different between different DMA remapping devices.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: David Woodhouse <dwmw2@infradead.org>
---
 drivers/iommu/intel-iommu.c |   19 +++++++++++++++++++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 96a9bd4..3246b47 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -77,6 +77,24 @@
 #define LEVEL_STRIDE		(9)
 #define LEVEL_MASK		(((u64)1 << LEVEL_STRIDE) - 1)
 
+/*
+ * This bitmap is used to advertise the page sizes our hardware support
+ * to the IOMMU core, which will then use this information to split
+ * physically contiguous memory regions it is mapping into page sizes
+ * that we support.
+ *
+ * Traditionally the IOMMU core just handed us the mappings directly,
+ * after making sure the size is an order of a 4KiB page and that the
+ * mapping has natural alignment.
+ *
+ * To retain this behavior, we currently advertise that we support
+ * all page sizes that are an order of 4KiB.
+ *
+ * If at some point we'd like to utilize the IOMMU core's new behavior,
+ * we could change this to advertise the real page sizes we support.
+ */
+#define INTEL_IOMMU_PGSIZES	(~0xFFFUL)
+
 static inline int agaw_to_level(int agaw)
 {
 	return agaw + 2;
@@ -3907,6 +3925,7 @@ static struct iommu_ops intel_iommu_ops = {
 	.unmap		= intel_iommu_unmap,
 	.iova_to_phys	= intel_iommu_iova_to_phys,
 	.domain_has_cap = intel_iommu_domain_has_cap,
+	.pgsize_bitmap	= INTEL_IOMMU_PGSIZES,
 };
 
 static void __devinit quirk_iommu_rwbf(struct pci_dev *dev)
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v4 6/7] iommu/intel: announce supported page sizes
@ 2011-10-17 11:27   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 81+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-17 11:27 UTC (permalink / raw)
  To: iommu
  Cc: linux-omap, Laurent Pinchart, Joerg Roedel, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho, Ohad Ben-Cohen

Let the IOMMU core know we support arbitrary page sizes (as long as
they're an order of 4KiB).

This way the IOMMU core will retain the existing behavior we're used to;
it will let us map regions that:
- their size is an order of 4KiB
- they are naturally aligned

Note: Intel IOMMU hardware doesn't support arbitrary page sizes,
but the driver does (it splits arbitrary-sized mappings into
the pages supported by the hardware).

To make everything simpler for now, though, this patch effectively tells
the IOMMU core to keep giving this driver the same memory regions it did
before, so nothing is changed as far as it's concerned.

At this point, the page sizes announced remain static within the IOMMU
core. To correctly utilize the pgsize-splitting of the IOMMU core by
this driver, it seems that some core changes should still be done,
because Intel's IOMMU page size capabilities seem to have the potential
to be different between different DMA remapping devices.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: David Woodhouse <dwmw2@infradead.org>
---
 drivers/iommu/intel-iommu.c |   19 +++++++++++++++++++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 96a9bd4..3246b47 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -77,6 +77,24 @@
 #define LEVEL_STRIDE		(9)
 #define LEVEL_MASK		(((u64)1 << LEVEL_STRIDE) - 1)
 
+/*
+ * This bitmap is used to advertise the page sizes our hardware support
+ * to the IOMMU core, which will then use this information to split
+ * physically contiguous memory regions it is mapping into page sizes
+ * that we support.
+ *
+ * Traditionally the IOMMU core just handed us the mappings directly,
+ * after making sure the size is an order of a 4KiB page and that the
+ * mapping has natural alignment.
+ *
+ * To retain this behavior, we currently advertise that we support
+ * all page sizes that are an order of 4KiB.
+ *
+ * If at some point we'd like to utilize the IOMMU core's new behavior,
+ * we could change this to advertise the real page sizes we support.
+ */
+#define INTEL_IOMMU_PGSIZES	(~0xFFFUL)
+
 static inline int agaw_to_level(int agaw)
 {
 	return agaw + 2;
@@ -3907,6 +3925,7 @@ static struct iommu_ops intel_iommu_ops = {
 	.unmap		= intel_iommu_unmap,
 	.iova_to_phys	= intel_iommu_iova_to_phys,
 	.domain_has_cap = intel_iommu_domain_has_cap,
+	.pgsize_bitmap	= INTEL_IOMMU_PGSIZES,
 };
 
 static void __devinit quirk_iommu_rwbf(struct pci_dev *dev)
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v4 6/7] iommu/intel: announce supported page sizes
@ 2011-10-17 11:27   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 81+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-17 11:27 UTC (permalink / raw)
  To: linux-arm-kernel

Let the IOMMU core know we support arbitrary page sizes (as long as
they're an order of 4KiB).

This way the IOMMU core will retain the existing behavior we're used to;
it will let us map regions that:
- their size is an order of 4KiB
- they are naturally aligned

Note: Intel IOMMU hardware doesn't support arbitrary page sizes,
but the driver does (it splits arbitrary-sized mappings into
the pages supported by the hardware).

To make everything simpler for now, though, this patch effectively tells
the IOMMU core to keep giving this driver the same memory regions it did
before, so nothing is changed as far as it's concerned.

At this point, the page sizes announced remain static within the IOMMU
core. To correctly utilize the pgsize-splitting of the IOMMU core by
this driver, it seems that some core changes should still be done,
because Intel's IOMMU page size capabilities seem to have the potential
to be different between different DMA remapping devices.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: David Woodhouse <dwmw2@infradead.org>
---
 drivers/iommu/intel-iommu.c |   19 +++++++++++++++++++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 96a9bd4..3246b47 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -77,6 +77,24 @@
 #define LEVEL_STRIDE		(9)
 #define LEVEL_MASK		(((u64)1 << LEVEL_STRIDE) - 1)
 
+/*
+ * This bitmap is used to advertise the page sizes our hardware support
+ * to the IOMMU core, which will then use this information to split
+ * physically contiguous memory regions it is mapping into page sizes
+ * that we support.
+ *
+ * Traditionally the IOMMU core just handed us the mappings directly,
+ * after making sure the size is an order of a 4KiB page and that the
+ * mapping has natural alignment.
+ *
+ * To retain this behavior, we currently advertise that we support
+ * all page sizes that are an order of 4KiB.
+ *
+ * If@some point we'd like to utilize the IOMMU core's new behavior,
+ * we could change this to advertise the real page sizes we support.
+ */
+#define INTEL_IOMMU_PGSIZES	(~0xFFFUL)
+
 static inline int agaw_to_level(int agaw)
 {
 	return agaw + 2;
@@ -3907,6 +3925,7 @@ static struct iommu_ops intel_iommu_ops = {
 	.unmap		= intel_iommu_unmap,
 	.iova_to_phys	= intel_iommu_iova_to_phys,
 	.domain_has_cap = intel_iommu_domain_has_cap,
+	.pgsize_bitmap	= INTEL_IOMMU_PGSIZES,
 };
 
 static void __devinit quirk_iommu_rwbf(struct pci_dev *dev)
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v4 7/7] iommu/core: remove the temporary pgsize settings
  2011-10-17 11:27 ` Ohad Ben-Cohen
  (?)
@ 2011-10-17 11:27   ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 81+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-17 11:27 UTC (permalink / raw)
  To: iommu
  Cc: linux-omap, Laurent Pinchart, Joerg Roedel, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho, Ohad Ben-Cohen

Now that all IOMMU drivers are exporting their supported pgsizes,
we can remove the default pgsize settings in register_iommu().

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
---
 drivers/iommu/iommu.c |   10 ----------
 1 files changed, 0 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 91df958..ff8d524 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -49,16 +49,6 @@ int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops)
 	if (bus->iommu_ops != NULL)
 		return -EBUSY;
 
-	/*
-	 * Set the default pgsize values, which retain the existing
-	 * IOMMU API behavior: drivers will be called to map
-	 * regions that are sized/aligned to order of 4KiB pages.
-	 *
-	 * This will be removed once all drivers are migrated.
-	 */
-	if (!ops->pgsize_bitmap)
-		ops->pgsize_bitmap = ~0xFFFUL;
-
 	bus->iommu_ops = ops;
 
 	/* Do IOMMU specific setup for this bus-type */
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v4 7/7] iommu/core: remove the temporary pgsize settings
@ 2011-10-17 11:27   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 81+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-17 11:27 UTC (permalink / raw)
  To: iommu
  Cc: linux-omap, Laurent Pinchart, Joerg Roedel, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho, Ohad Ben-Cohen

Now that all IOMMU drivers are exporting their supported pgsizes,
we can remove the default pgsize settings in register_iommu().

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
---
 drivers/iommu/iommu.c |   10 ----------
 1 files changed, 0 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 91df958..ff8d524 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -49,16 +49,6 @@ int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops)
 	if (bus->iommu_ops != NULL)
 		return -EBUSY;
 
-	/*
-	 * Set the default pgsize values, which retain the existing
-	 * IOMMU API behavior: drivers will be called to map
-	 * regions that are sized/aligned to order of 4KiB pages.
-	 *
-	 * This will be removed once all drivers are migrated.
-	 */
-	if (!ops->pgsize_bitmap)
-		ops->pgsize_bitmap = ~0xFFFUL;
-
 	bus->iommu_ops = ops;
 
 	/* Do IOMMU specific setup for this bus-type */
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v4 7/7] iommu/core: remove the temporary pgsize settings
@ 2011-10-17 11:27   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 81+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-17 11:27 UTC (permalink / raw)
  To: linux-arm-kernel

Now that all IOMMU drivers are exporting their supported pgsizes,
we can remove the default pgsize settings in register_iommu().

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
---
 drivers/iommu/iommu.c |   10 ----------
 1 files changed, 0 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 91df958..ff8d524 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -49,16 +49,6 @@ int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops)
 	if (bus->iommu_ops != NULL)
 		return -EBUSY;
 
-	/*
-	 * Set the default pgsize values, which retain the existing
-	 * IOMMU API behavior: drivers will be called to map
-	 * regions that are sized/aligned to order of 4KiB pages.
-	 *
-	 * This will be removed once all drivers are migrated.
-	 */
-	if (!ops->pgsize_bitmap)
-		ops->pgsize_bitmap = ~0xFFFUL;
-
 	bus->iommu_ops = ops;
 
 	/* Do IOMMU specific setup for this bus-type */
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* Re: [PATCH v4 0/7] iommu: split mapping to page sizes as supported by the hardware
  2011-10-17 11:27 ` Ohad Ben-Cohen
@ 2011-11-08 12:57   ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 81+ messages in thread
From: Ohad Ben-Cohen @ 2011-11-08 12:57 UTC (permalink / raw)
  To: iommu, Joerg Roedel
  Cc: linux-omap, Laurent Pinchart, David Woodhouse, linux-arm-kernel,
	David Brown, Arnd Bergmann, linux-kernel, Hiroshi Doyu,
	Stepan Moskovchenko, KyongHo Cho

Joerg, gentle reminder. thanks.

On Mon, Oct 17, 2011 at 1:27 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> v3->v4:
> - simplify splitter logic (Joerg)
> - declare supported page-sizes in the iommu_ops, without extending iommu_register (Joerg)
> - iommu_unmap should now return bytes too (Joerg)
> - don't cache min_pgsize anymore (Joerg)
> - handle cases when ->unmap() actually unmaps more than requested (Joerg)
> - unroll iommu_unmap completely in case it fails (KyongHo)
> - use size_t for size parameters (KyongHo)
> - add a patch to remove the bytes->order->bytes conversion we had
> - rabase to master branch of the iommu tree (Joerg)
>
> v2->v3:
> - s/KB/KiB/ (David W)
>
> v1->v2:
> - split to patches (by keeping the old code around until all drivers are converted) (Joerg)
>
> Tested with OMAP3 (omap3isp) and OMAP4 (rpmsg/remoteproc).
> Compile tested with X86_64.
>
> Ohad Ben-Cohen (7):
>  iommu/core: stop converting bytes to page order back and forth
>  iommu/core: split mapping to page sizes as supported by the hardware
>  iommu/omap: announce supported page sizes
>  iommu/msm: announce supported page sizes
>  iommu/amd: announce supported page sizes
>  iommu/intel: announce supported page sizes
>  iommu/core: remove the temporary pgsize settings
>
>  drivers/iommu/amd_iommu.c   |   32 +++++++++---
>  drivers/iommu/intel-iommu.c |   30 ++++++++---
>  drivers/iommu/iommu.c       |  119 ++++++++++++++++++++++++++++++++++++++----
>  drivers/iommu/msm_iommu.c   |   25 ++++-----
>  drivers/iommu/omap-iommu.c  |   18 +++---
>  drivers/iommu/omap-iovmm.c  |   17 ++----
>  include/linux/iommu.h       |   26 +++++++--
>  virt/kvm/iommu.c            |    8 ++--
>  8 files changed, 205 insertions(+), 70 deletions(-)
>
> --
> 1.7.4.1
>
>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v4 0/7] iommu: split mapping to page sizes as supported by the hardware
@ 2011-11-08 12:57   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 81+ messages in thread
From: Ohad Ben-Cohen @ 2011-11-08 12:57 UTC (permalink / raw)
  To: linux-arm-kernel

Joerg, gentle reminder. thanks.

On Mon, Oct 17, 2011 at 1:27 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> v3->v4:
> - simplify splitter logic (Joerg)
> - declare supported page-sizes in the iommu_ops, without extending iommu_register (Joerg)
> - iommu_unmap should now return bytes too (Joerg)
> - don't cache min_pgsize anymore (Joerg)
> - handle cases when ->unmap() actually unmaps more than requested (Joerg)
> - unroll iommu_unmap completely in case it fails (KyongHo)
> - use size_t for size parameters (KyongHo)
> - add a patch to remove the bytes->order->bytes conversion we had
> - rabase to master branch of the iommu tree (Joerg)
>
> v2->v3:
> - s/KB/KiB/ (David W)
>
> v1->v2:
> - split to patches (by keeping the old code around until all drivers are converted) (Joerg)
>
> Tested with OMAP3 (omap3isp) and OMAP4 (rpmsg/remoteproc).
> Compile tested with X86_64.
>
> Ohad Ben-Cohen (7):
> ?iommu/core: stop converting bytes to page order back and forth
> ?iommu/core: split mapping to page sizes as supported by the hardware
> ?iommu/omap: announce supported page sizes
> ?iommu/msm: announce supported page sizes
> ?iommu/amd: announce supported page sizes
> ?iommu/intel: announce supported page sizes
> ?iommu/core: remove the temporary pgsize settings
>
> ?drivers/iommu/amd_iommu.c ? | ? 32 +++++++++---
> ?drivers/iommu/intel-iommu.c | ? 30 ++++++++---
> ?drivers/iommu/iommu.c ? ? ? | ?119 ++++++++++++++++++++++++++++++++++++++----
> ?drivers/iommu/msm_iommu.c ? | ? 25 ++++-----
> ?drivers/iommu/omap-iommu.c ?| ? 18 +++---
> ?drivers/iommu/omap-iovmm.c ?| ? 17 ++----
> ?include/linux/iommu.h ? ? ? | ? 26 +++++++--
> ?virt/kvm/iommu.c ? ? ? ? ? ?| ? ?8 ++--
> ?8 files changed, 205 insertions(+), 70 deletions(-)
>
> --
> 1.7.4.1
>
>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v4 0/7] iommu: split mapping to page sizes as supported by the hardware
  2011-11-08 12:57   ` Ohad Ben-Cohen
  (?)
@ 2011-11-08 14:01     ` Joerg Roedel
  -1 siblings, 0 replies; 81+ messages in thread
From: Joerg Roedel @ 2011-11-08 14:01 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: iommu, linux-omap, Laurent Pinchart, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho

On Tue, Nov 08, 2011 at 02:57:02PM +0200, Ohad Ben-Cohen wrote:
> Joerg, gentle reminder. thanks.

Just wanted to wait until -rc1 is out and I can start merging
new stuff :)


	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v4 0/7] iommu: split mapping to page sizes as supported by the hardware
@ 2011-11-08 14:01     ` Joerg Roedel
  0 siblings, 0 replies; 81+ messages in thread
From: Joerg Roedel @ 2011-11-08 14:01 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: iommu, linux-omap, Laurent Pinchart, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho

On Tue, Nov 08, 2011 at 02:57:02PM +0200, Ohad Ben-Cohen wrote:
> Joerg, gentle reminder. thanks.

Just wanted to wait until -rc1 is out and I can start merging
new stuff :)


	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v4 0/7] iommu: split mapping to page sizes as supported by the hardware
@ 2011-11-08 14:01     ` Joerg Roedel
  0 siblings, 0 replies; 81+ messages in thread
From: Joerg Roedel @ 2011-11-08 14:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Nov 08, 2011 at 02:57:02PM +0200, Ohad Ben-Cohen wrote:
> Joerg, gentle reminder. thanks.

Just wanted to wait until -rc1 is out and I can start merging
new stuff :)


	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v4 0/7] iommu: split mapping to page sizes as supported by the hardware
  2011-11-08 14:01     ` Joerg Roedel
@ 2011-11-08 14:03       ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 81+ messages in thread
From: Ohad Ben-Cohen @ 2011-11-08 14:03 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: iommu, linux-omap, Laurent Pinchart, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho

On Tue, Nov 8, 2011 at 4:01 PM, Joerg Roedel <Joerg.Roedel@amd.com> wrote:
> On Tue, Nov 08, 2011 at 02:57:02PM +0200, Ohad Ben-Cohen wrote:
>> Joerg, gentle reminder. thanks.
>
> Just wanted to wait until -rc1 is out and I can start merging
> new stuff :)

np, thanks :)

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v4 0/7] iommu: split mapping to page sizes as supported by the hardware
@ 2011-11-08 14:03       ` Ohad Ben-Cohen
  0 siblings, 0 replies; 81+ messages in thread
From: Ohad Ben-Cohen @ 2011-11-08 14:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Nov 8, 2011 at 4:01 PM, Joerg Roedel <Joerg.Roedel@amd.com> wrote:
> On Tue, Nov 08, 2011 at 02:57:02PM +0200, Ohad Ben-Cohen wrote:
>> Joerg, gentle reminder. thanks.
>
> Just wanted to wait until -rc1 is out and I can start merging
> new stuff :)

np, thanks :)

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v4 0/7] iommu: split mapping to page sizes as supported by the hardware
  2011-11-08 14:03       ` Ohad Ben-Cohen
  (?)
@ 2011-11-08 16:23         ` Joerg Roedel
  -1 siblings, 0 replies; 81+ messages in thread
From: Joerg Roedel @ 2011-11-08 16:23 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: iommu, linux-omap, Laurent Pinchart, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho

On Tue, Nov 08, 2011 at 04:03:09PM +0200, Ohad Ben-Cohen wrote:
> On Tue, Nov 8, 2011 at 4:01 PM, Joerg Roedel <Joerg.Roedel@amd.com> wrote:
> > On Tue, Nov 08, 2011 at 02:57:02PM +0200, Ohad Ben-Cohen wrote:
> >> Joerg, gentle reminder. thanks.
> >
> > Just wanted to wait until -rc1 is out and I can start merging
> > new stuff :)
> 
> np, thanks :)

Does not apply cleanly. Can you please rebase against v3.2-rc1 and
resend?


	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v4 0/7] iommu: split mapping to page sizes as supported by the hardware
@ 2011-11-08 16:23         ` Joerg Roedel
  0 siblings, 0 replies; 81+ messages in thread
From: Joerg Roedel @ 2011-11-08 16:23 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: iommu, linux-omap, Laurent Pinchart, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho

On Tue, Nov 08, 2011 at 04:03:09PM +0200, Ohad Ben-Cohen wrote:
> On Tue, Nov 8, 2011 at 4:01 PM, Joerg Roedel <Joerg.Roedel@amd.com> wrote:
> > On Tue, Nov 08, 2011 at 02:57:02PM +0200, Ohad Ben-Cohen wrote:
> >> Joerg, gentle reminder. thanks.
> >
> > Just wanted to wait until -rc1 is out and I can start merging
> > new stuff :)
> 
> np, thanks :)

Does not apply cleanly. Can you please rebase against v3.2-rc1 and
resend?


	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v4 0/7] iommu: split mapping to page sizes as supported by the hardware
@ 2011-11-08 16:23         ` Joerg Roedel
  0 siblings, 0 replies; 81+ messages in thread
From: Joerg Roedel @ 2011-11-08 16:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Nov 08, 2011 at 04:03:09PM +0200, Ohad Ben-Cohen wrote:
> On Tue, Nov 8, 2011 at 4:01 PM, Joerg Roedel <Joerg.Roedel@amd.com> wrote:
> > On Tue, Nov 08, 2011 at 02:57:02PM +0200, Ohad Ben-Cohen wrote:
> >> Joerg, gentle reminder. thanks.
> >
> > Just wanted to wait until -rc1 is out and I can start merging
> > new stuff :)
> 
> np, thanks :)

Does not apply cleanly. Can you please rebase against v3.2-rc1 and
resend?


	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v4 0/7] iommu: split mapping to page sizes as supported by the hardware
  2011-11-08 16:23         ` Joerg Roedel
@ 2011-11-08 16:43           ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 81+ messages in thread
From: Ohad Ben-Cohen @ 2011-11-08 16:43 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: iommu, linux-omap, Laurent Pinchart, David Woodhouse,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho

On Tue, Nov 8, 2011 at 6:23 PM, Joerg Roedel <Joerg.Roedel@amd.com> wrote:
> Does not apply cleanly. Can you please rebase against v3.2-rc1 and
> resend?

sure!

will refresh and resend.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v4 0/7] iommu: split mapping to page sizes as supported by the hardware
@ 2011-11-08 16:43           ` Ohad Ben-Cohen
  0 siblings, 0 replies; 81+ messages in thread
From: Ohad Ben-Cohen @ 2011-11-08 16:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Nov 8, 2011 at 6:23 PM, Joerg Roedel <Joerg.Roedel@amd.com> wrote:
> Does not apply cleanly. Can you please rebase against v3.2-rc1 and
> resend?

sure!

will refresh and resend.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
  2011-10-17 11:27   ` Ohad Ben-Cohen
@ 2011-11-10  6:17     ` Kai Huang
  -1 siblings, 0 replies; 81+ messages in thread
From: Kai Huang @ 2011-11-10  6:17 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: iommu, linux-omap, Laurent Pinchart, Joerg Roedel,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho,
	kvm

>
> -int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
> +size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
>  {
> -       size_t size, unmapped;
> +       size_t unmapped_page, unmapped = 0;
> +       unsigned int min_pagesz;
>
>        if (unlikely(domain->ops->unmap == NULL))
>                return -ENODEV;
>
> -       size         = PAGE_SIZE << gfp_order;
> -
> -       BUG_ON(!IS_ALIGNED(iova, size));
> -
> -       unmapped = domain->ops->unmap(domain, iova, size);
> -
> -       return get_order(unmapped);
> +       /* find out the minimum page size supported */
> +       min_pagesz = 1 << __ffs(domain->ops->pgsize_bitmap);
> +
> +       /*
> +        * The virtual address, as well as the size of the mapping, must be
> +        * aligned (at least) to the size of the smallest page supported
> +        * by the hardware
> +        */
> +       if (!IS_ALIGNED(iova | size, min_pagesz)) {
> +               pr_err("unaligned: iova 0x%lx size 0x%lx min_pagesz 0x%x\n",
> +                                       iova, (unsigned long)size, min_pagesz);
> +               return -EINVAL;
> +       }
> +
> +       pr_debug("unmap this: iova 0x%lx size 0x%lx\n", iova,
> +                                                       (unsigned long)size);
> +
> +       /*
> +        * Keep iterating until we either unmap 'size' bytes (or more)
> +        * or we hit an area that isn't mapped.
> +        */
> +       while (unmapped < size) {
> +               size_t left = size - unmapped;
> +
> +               unmapped_page = domain->ops->unmap(domain, iova, left);
> +               if (!unmapped_page)
> +                       break;
> +
> +               pr_debug("unmapped: iova 0x%lx size %lx\n", iova,
> +                                       (unsigned long)unmapped_page);
> +
> +               iova += unmapped_page;
> +               unmapped += unmapped_page;
> +       }
> +
> +       return unmapped;
>  }
>  EXPORT_SYMBOL_GPL(iommu_unmap);
>

Seems the unmap function don't take phys as parameter, does this mean
domain->ops->unmap will walk through the page table to find out the
actual page size?

And another question: have we considered the IOTLB flush operation? I
think we need to implement similar logic when flush the DVMA range.
Intel VT-d's manual says software needs to specify the appropriate
mask value to flush large pages, but it does not say we need to
exactly match the page size as it is mapped. I guess it's not
necessary for Intel IOMMU, but other vendor's IOMMU may have such
limitation (or some other limitations). In my understanding current
implementation does not provide page size information for particular
DVMA ranges that has been mapped, and it's not flexible to implement
IOTLB flush code (ex, we may need to walk through page table to find
out actual page size). Maybe we can also add iommu_ops->flush_iotlb ?

-cody

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-11-10  6:17     ` Kai Huang
  0 siblings, 0 replies; 81+ messages in thread
From: Kai Huang @ 2011-11-10  6:17 UTC (permalink / raw)
  To: linux-arm-kernel

>
> -int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order)
> +size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
> ?{
> - ? ? ? size_t size, unmapped;
> + ? ? ? size_t unmapped_page, unmapped = 0;
> + ? ? ? unsigned int min_pagesz;
>
> ? ? ? ?if (unlikely(domain->ops->unmap == NULL))
> ? ? ? ? ? ? ? ?return -ENODEV;
>
> - ? ? ? size ? ? ? ? = PAGE_SIZE << gfp_order;
> -
> - ? ? ? BUG_ON(!IS_ALIGNED(iova, size));
> -
> - ? ? ? unmapped = domain->ops->unmap(domain, iova, size);
> -
> - ? ? ? return get_order(unmapped);
> + ? ? ? /* find out the minimum page size supported */
> + ? ? ? min_pagesz = 1 << __ffs(domain->ops->pgsize_bitmap);
> +
> + ? ? ? /*
> + ? ? ? ?* The virtual address, as well as the size of the mapping, must be
> + ? ? ? ?* aligned (at least) to the size of the smallest page supported
> + ? ? ? ?* by the hardware
> + ? ? ? ?*/
> + ? ? ? if (!IS_ALIGNED(iova | size, min_pagesz)) {
> + ? ? ? ? ? ? ? pr_err("unaligned: iova 0x%lx size 0x%lx min_pagesz 0x%x\n",
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? iova, (unsigned long)size, min_pagesz);
> + ? ? ? ? ? ? ? return -EINVAL;
> + ? ? ? }
> +
> + ? ? ? pr_debug("unmap this: iova 0x%lx size 0x%lx\n", iova,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (unsigned long)size);
> +
> + ? ? ? /*
> + ? ? ? ?* Keep iterating until we either unmap 'size' bytes (or more)
> + ? ? ? ?* or we hit an area that isn't mapped.
> + ? ? ? ?*/
> + ? ? ? while (unmapped < size) {
> + ? ? ? ? ? ? ? size_t left = size - unmapped;
> +
> + ? ? ? ? ? ? ? unmapped_page = domain->ops->unmap(domain, iova, left);
> + ? ? ? ? ? ? ? if (!unmapped_page)
> + ? ? ? ? ? ? ? ? ? ? ? break;
> +
> + ? ? ? ? ? ? ? pr_debug("unmapped: iova 0x%lx size %lx\n", iova,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (unsigned long)unmapped_page);
> +
> + ? ? ? ? ? ? ? iova += unmapped_page;
> + ? ? ? ? ? ? ? unmapped += unmapped_page;
> + ? ? ? }
> +
> + ? ? ? return unmapped;
> ?}
> ?EXPORT_SYMBOL_GPL(iommu_unmap);
>

Seems the unmap function don't take phys as parameter, does this mean
domain->ops->unmap will walk through the page table to find out the
actual page size?

And another question: have we considered the IOTLB flush operation? I
think we need to implement similar logic when flush the DVMA range.
Intel VT-d's manual says software needs to specify the appropriate
mask value to flush large pages, but it does not say we need to
exactly match the page size as it is mapped. I guess it's not
necessary for Intel IOMMU, but other vendor's IOMMU may have such
limitation (or some other limitations). In my understanding current
implementation does not provide page size information for particular
DVMA ranges that has been mapped, and it's not flexible to implement
IOTLB flush code (ex, we may need to walk through page table to find
out actual page size). Maybe we can also add iommu_ops->flush_iotlb ?

-cody

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
  2011-11-10  6:17     ` Kai Huang
@ 2011-11-10  7:31       ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 81+ messages in thread
From: Ohad Ben-Cohen @ 2011-11-10  7:31 UTC (permalink / raw)
  To: Kai Huang
  Cc: iommu, linux-omap, Laurent Pinchart, Joerg Roedel,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho,
	kvm

On Thu, Nov 10, 2011 at 8:17 AM, Kai Huang <mail.kai.huang@gmail.com> wrote:
> Seems the unmap function don't take phys as parameter, does this mean
> domain->ops->unmap will walk through the page table to find out the
> actual page size?

The short answer is yes, and furthermore, we also consider to remove
the size param from domain->ops->unmap entirely at some point.

We had a long discussion about it, please see:

https://lkml.org/lkml/2011/10/10/234

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-11-10  7:31       ` Ohad Ben-Cohen
  0 siblings, 0 replies; 81+ messages in thread
From: Ohad Ben-Cohen @ 2011-11-10  7:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 10, 2011 at 8:17 AM, Kai Huang <mail.kai.huang@gmail.com> wrote:
> Seems the unmap function don't take phys as parameter, does this mean
> domain->ops->unmap will walk through the page table to find out the
> actual page size?

The short answer is yes, and furthermore, we also consider to remove
the size param from domain->ops->unmap entirely at some point.

We had a long discussion about it, please see:

https://lkml.org/lkml/2011/10/10/234

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
  2011-11-10  7:31       ` Ohad Ben-Cohen
@ 2011-11-10 12:16         ` cody
  -1 siblings, 0 replies; 81+ messages in thread
From: cody @ 2011-11-10 12:16 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: iommu, linux-omap, Laurent Pinchart, Joerg Roedel,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho,
	kvm

On 11/10/2011 03:31 PM, Ohad Ben-Cohen wrote:
> On Thu, Nov 10, 2011 at 8:17 AM, Kai Huang<mail.kai.huang@gmail.com>  wrote:
>    
>> Seems the unmap function don't take phys as parameter, does this mean
>> domain->ops->unmap will walk through the page table to find out the
>> actual page size?
>>      
> The short answer is yes, and furthermore, we also consider to remove
> the size param from domain->ops->unmap entirely at some point.
>
> We had a long discussion about it, please see:
>
> https://lkml.org/lkml/2011/10/10/234
>    
Yes I've seen your discussion, I followed this thread from beginning:)

How about the IOTLB flush? As I said I think we need to consider that 
IOMMU (even does not exist now) may have some limitation on IOTLB flush, 
and hiding page size from IOTLB flush code may hurt performance, or even 
worse, trigger undefined behaviors.

-cody

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-11-10 12:16         ` cody
  0 siblings, 0 replies; 81+ messages in thread
From: cody @ 2011-11-10 12:16 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/10/2011 03:31 PM, Ohad Ben-Cohen wrote:
> On Thu, Nov 10, 2011 at 8:17 AM, Kai Huang<mail.kai.huang@gmail.com>  wrote:
>    
>> Seems the unmap function don't take phys as parameter, does this mean
>> domain->ops->unmap will walk through the page table to find out the
>> actual page size?
>>      
> The short answer is yes, and furthermore, we also consider to remove
> the size param from domain->ops->unmap entirely at some point.
>
> We had a long discussion about it, please see:
>
> https://lkml.org/lkml/2011/10/10/234
>    
Yes I've seen your discussion, I followed this thread from beginning:)

How about the IOTLB flush? As I said I think we need to consider that 
IOMMU (even does not exist now) may have some limitation on IOTLB flush, 
and hiding page size from IOTLB flush code may hurt performance, or even 
worse, trigger undefined behaviors.

-cody

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
  2011-11-10 12:16         ` cody
  (?)
@ 2011-11-10 13:08           ` Joerg Roedel
  -1 siblings, 0 replies; 81+ messages in thread
From: Joerg Roedel @ 2011-11-10 13:08 UTC (permalink / raw)
  To: cody
  Cc: Ohad Ben-Cohen, iommu, linux-omap, Laurent Pinchart,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho,
	kvm

On Thu, Nov 10, 2011 at 08:16:16PM +0800, cody wrote:
> On 11/10/2011 03:31 PM, Ohad Ben-Cohen wrote:
> >On Thu, Nov 10, 2011 at 8:17 AM, Kai Huang<mail.kai.huang@gmail.com>  wrote:
> >>Seems the unmap function don't take phys as parameter, does this mean
> >>domain->ops->unmap will walk through the page table to find out the
> >>actual page size?
> >The short answer is yes, and furthermore, we also consider to remove
> >the size param from domain->ops->unmap entirely at some point.
> >
> >We had a long discussion about it, please see:
> >
> >https://lkml.org/lkml/2011/10/10/234
> Yes I've seen your discussion, I followed this thread from beginning:)
> 
> How about the IOTLB flush? As I said I think we need to consider
> that IOMMU (even does not exist now) may have some limitation on
> IOTLB flush, and hiding page size from IOTLB flush code may hurt
> performance, or even worse, trigger undefined behaviors.

We can only care about IOMMUs that exist today or ones that will exist
and we already know of.
In general for the hardware I know of a page-size is not required for
implementing unmap operations. Requiring this would imply that any user
of the IOMMU-API needs to keeps track of the page-sizes used to map a
given area.
This would be a huge burden which is not really necessary because the
IOMMU driver already has this information and can return it to the user.
So if you want to change that you need a very good reason for it.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-11-10 13:08           ` Joerg Roedel
  0 siblings, 0 replies; 81+ messages in thread
From: Joerg Roedel @ 2011-11-10 13:08 UTC (permalink / raw)
  To: cody
  Cc: Ohad Ben-Cohen, iommu, linux-omap, Laurent Pinchart,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho,
	kvm

On Thu, Nov 10, 2011 at 08:16:16PM +0800, cody wrote:
> On 11/10/2011 03:31 PM, Ohad Ben-Cohen wrote:
> >On Thu, Nov 10, 2011 at 8:17 AM, Kai Huang<mail.kai.huang@gmail.com>  wrote:
> >>Seems the unmap function don't take phys as parameter, does this mean
> >>domain->ops->unmap will walk through the page table to find out the
> >>actual page size?
> >The short answer is yes, and furthermore, we also consider to remove
> >the size param from domain->ops->unmap entirely at some point.
> >
> >We had a long discussion about it, please see:
> >
> >https://lkml.org/lkml/2011/10/10/234
> Yes I've seen your discussion, I followed this thread from beginning:)
> 
> How about the IOTLB flush? As I said I think we need to consider
> that IOMMU (even does not exist now) may have some limitation on
> IOTLB flush, and hiding page size from IOTLB flush code may hurt
> performance, or even worse, trigger undefined behaviors.

We can only care about IOMMUs that exist today or ones that will exist
and we already know of.
In general for the hardware I know of a page-size is not required for
implementing unmap operations. Requiring this would imply that any user
of the IOMMU-API needs to keeps track of the page-sizes used to map a
given area.
This would be a huge burden which is not really necessary because the
IOMMU driver already has this information and can return it to the user.
So if you want to change that you need a very good reason for it.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-11-10 13:08           ` Joerg Roedel
  0 siblings, 0 replies; 81+ messages in thread
From: Joerg Roedel @ 2011-11-10 13:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 10, 2011 at 08:16:16PM +0800, cody wrote:
> On 11/10/2011 03:31 PM, Ohad Ben-Cohen wrote:
> >On Thu, Nov 10, 2011 at 8:17 AM, Kai Huang<mail.kai.huang@gmail.com>  wrote:
> >>Seems the unmap function don't take phys as parameter, does this mean
> >>domain->ops->unmap will walk through the page table to find out the
> >>actual page size?
> >The short answer is yes, and furthermore, we also consider to remove
> >the size param from domain->ops->unmap entirely at some point.
> >
> >We had a long discussion about it, please see:
> >
> >https://lkml.org/lkml/2011/10/10/234
> Yes I've seen your discussion, I followed this thread from beginning:)
> 
> How about the IOTLB flush? As I said I think we need to consider
> that IOMMU (even does not exist now) may have some limitation on
> IOTLB flush, and hiding page size from IOTLB flush code may hurt
> performance, or even worse, trigger undefined behaviors.

We can only care about IOMMUs that exist today or ones that will exist
and we already know of.
In general for the hardware I know of a page-size is not required for
implementing unmap operations. Requiring this would imply that any user
of the IOMMU-API needs to keeps track of the page-sizes used to map a
given area.
This would be a huge burden which is not really necessary because the
IOMMU driver already has this information and can return it to the user.
So if you want to change that you need a very good reason for it.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
  2011-11-10 13:08           ` Joerg Roedel
@ 2011-11-10 14:35             ` cody
  -1 siblings, 0 replies; 81+ messages in thread
From: cody @ 2011-11-10 14:35 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Ohad Ben-Cohen, iommu, linux-omap, Laurent Pinchart,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho,
	kvm

On 11/10/2011 09:08 PM, Joerg Roedel wrote:
> On Thu, Nov 10, 2011 at 08:16:16PM +0800, cody wrote:
>    
>> On 11/10/2011 03:31 PM, Ohad Ben-Cohen wrote:
>>      
>>> On Thu, Nov 10, 2011 at 8:17 AM, Kai Huang<mail.kai.huang@gmail.com>   wrote:
>>>        
>>>> Seems the unmap function don't take phys as parameter, does this mean
>>>> domain->ops->unmap will walk through the page table to find out the
>>>> actual page size?
>>>>          
>>> The short answer is yes, and furthermore, we also consider to remove
>>> the size param from domain->ops->unmap entirely at some point.
>>>
>>> We had a long discussion about it, please see:
>>>
>>> https://lkml.org/lkml/2011/10/10/234
>>>        
>> Yes I've seen your discussion, I followed this thread from beginning:)
>>
>> How about the IOTLB flush? As I said I think we need to consider
>> that IOMMU (even does not exist now) may have some limitation on
>> IOTLB flush, and hiding page size from IOTLB flush code may hurt
>> performance, or even worse, trigger undefined behaviors.
>>      
> We can only care about IOMMUs that exist today or ones that will exist
> and we already know of.
> In general for the hardware I know of a page-size is not required for
> implementing unmap operations. Requiring this would imply that any user
> of the IOMMU-API needs to keeps track of the page-sizes used to map a
> given area.
> This would be a huge burden which is not really necessary because the
> IOMMU driver already has this information and can return it to the user.
> So if you want to change that you need a very good reason for it.
>
>    
Yes I totally agree page-size is not required for unmap operations and 
should not be added as parameter to map/unmap operations. I am not 
saying the unmap operation, but the IOTLB flush operation. My point is 
we also may also need to add similar logic in IOTLB flush code (such as 
in Intel IOMMU dirver) to grantee that when issuing IOTLB flush command 
for large page, we will still meet the hardware limitation of flushing 
large page. Seems for Intel IOMMU the only limitation is the mask value 
(indicating number of 4k-pages) cannot be smaller than the value to 
cover large page, and seems current Intel IOMMU driver code has covered 
this case well. I am not familiar with how AMD IOMMU issues IOTLB flush 
command, it should be able to handle this large page case too. So at 
this moment, this patch should not have any issues :)

-cody
> 	Joerg
>
>    


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-11-10 14:35             ` cody
  0 siblings, 0 replies; 81+ messages in thread
From: cody @ 2011-11-10 14:35 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/10/2011 09:08 PM, Joerg Roedel wrote:
> On Thu, Nov 10, 2011 at 08:16:16PM +0800, cody wrote:
>    
>> On 11/10/2011 03:31 PM, Ohad Ben-Cohen wrote:
>>      
>>> On Thu, Nov 10, 2011 at 8:17 AM, Kai Huang<mail.kai.huang@gmail.com>   wrote:
>>>        
>>>> Seems the unmap function don't take phys as parameter, does this mean
>>>> domain->ops->unmap will walk through the page table to find out the
>>>> actual page size?
>>>>          
>>> The short answer is yes, and furthermore, we also consider to remove
>>> the size param from domain->ops->unmap entirely at some point.
>>>
>>> We had a long discussion about it, please see:
>>>
>>> https://lkml.org/lkml/2011/10/10/234
>>>        
>> Yes I've seen your discussion, I followed this thread from beginning:)
>>
>> How about the IOTLB flush? As I said I think we need to consider
>> that IOMMU (even does not exist now) may have some limitation on
>> IOTLB flush, and hiding page size from IOTLB flush code may hurt
>> performance, or even worse, trigger undefined behaviors.
>>      
> We can only care about IOMMUs that exist today or ones that will exist
> and we already know of.
> In general for the hardware I know of a page-size is not required for
> implementing unmap operations. Requiring this would imply that any user
> of the IOMMU-API needs to keeps track of the page-sizes used to map a
> given area.
> This would be a huge burden which is not really necessary because the
> IOMMU driver already has this information and can return it to the user.
> So if you want to change that you need a very good reason for it.
>
>    
Yes I totally agree page-size is not required for unmap operations and 
should not be added as parameter to map/unmap operations. I am not 
saying the unmap operation, but the IOTLB flush operation. My point is 
we also may also need to add similar logic in IOTLB flush code (such as 
in Intel IOMMU dirver) to grantee that when issuing IOTLB flush command 
for large page, we will still meet the hardware limitation of flushing 
large page. Seems for Intel IOMMU the only limitation is the mask value 
(indicating number of 4k-pages) cannot be smaller than the value to 
cover large page, and seems current Intel IOMMU driver code has covered 
this case well. I am not familiar with how AMD IOMMU issues IOTLB flush 
command, it should be able to handle this large page case too. So at 
this moment, this patch should not have any issues :)

-cody
> 	Joerg
>
>    

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
  2011-11-10 14:35             ` cody
  (?)
@ 2011-11-10 14:51               ` Joerg Roedel
  -1 siblings, 0 replies; 81+ messages in thread
From: Joerg Roedel @ 2011-11-10 14:51 UTC (permalink / raw)
  To: cody
  Cc: Ohad Ben-Cohen, iommu, linux-omap, Laurent Pinchart,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho,
	kvm

On Thu, Nov 10, 2011 at 10:35:34PM +0800, cody wrote:
> Yes I totally agree page-size is not required for unmap operations
> and should not be added as parameter to map/unmap operations. I am
> not saying the unmap operation, but the IOTLB flush operation. My
> point is we also may also need to add similar logic in IOTLB flush
> code (such as in Intel IOMMU dirver) to grantee that when issuing
> IOTLB flush command for large page, we will still meet the hardware
> limitation of flushing large page. Seems for Intel IOMMU the only
> limitation is the mask value (indicating number of 4k-pages) cannot
> be smaller than the value to cover large page, and seems current
> Intel IOMMU driver code has covered this case well. I am not
> familiar with how AMD IOMMU issues IOTLB flush command, it should be
> able to handle this large page case too. So at this moment, this
> patch should not have any issues :)

The map-operation actually takes a size, as it should. The idea is to
change this function to a map_page interface which takes a page-size
parameter, but thats another story.
The IOTLB flushing is not exposed by the IOMMU-API anyway. To whatever
is necessary to do that, it is the business of the IOMMU driver. So in
the unmap-path the driver finds out the page-size to unmap and can
immediatly flush the IOTLB for that page.


	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-11-10 14:51               ` Joerg Roedel
  0 siblings, 0 replies; 81+ messages in thread
From: Joerg Roedel @ 2011-11-10 14:51 UTC (permalink / raw)
  To: cody
  Cc: Ohad Ben-Cohen, iommu, linux-omap, Laurent Pinchart,
	David Woodhouse, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho,
	kvm

On Thu, Nov 10, 2011 at 10:35:34PM +0800, cody wrote:
> Yes I totally agree page-size is not required for unmap operations
> and should not be added as parameter to map/unmap operations. I am
> not saying the unmap operation, but the IOTLB flush operation. My
> point is we also may also need to add similar logic in IOTLB flush
> code (such as in Intel IOMMU dirver) to grantee that when issuing
> IOTLB flush command for large page, we will still meet the hardware
> limitation of flushing large page. Seems for Intel IOMMU the only
> limitation is the mask value (indicating number of 4k-pages) cannot
> be smaller than the value to cover large page, and seems current
> Intel IOMMU driver code has covered this case well. I am not
> familiar with how AMD IOMMU issues IOTLB flush command, it should be
> able to handle this large page case too. So at this moment, this
> patch should not have any issues :)

The map-operation actually takes a size, as it should. The idea is to
change this function to a map_page interface which takes a page-size
parameter, but thats another story.
The IOTLB flushing is not exposed by the IOMMU-API anyway. To whatever
is necessary to do that, it is the business of the IOMMU driver. So in
the unmap-path the driver finds out the page-size to unmap and can
immediatly flush the IOTLB for that page.


	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-11-10 14:51               ` Joerg Roedel
  0 siblings, 0 replies; 81+ messages in thread
From: Joerg Roedel @ 2011-11-10 14:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 10, 2011 at 10:35:34PM +0800, cody wrote:
> Yes I totally agree page-size is not required for unmap operations
> and should not be added as parameter to map/unmap operations. I am
> not saying the unmap operation, but the IOTLB flush operation. My
> point is we also may also need to add similar logic in IOTLB flush
> code (such as in Intel IOMMU dirver) to grantee that when issuing
> IOTLB flush command for large page, we will still meet the hardware
> limitation of flushing large page. Seems for Intel IOMMU the only
> limitation is the mask value (indicating number of 4k-pages) cannot
> be smaller than the value to cover large page, and seems current
> Intel IOMMU driver code has covered this case well. I am not
> familiar with how AMD IOMMU issues IOTLB flush command, it should be
> able to handle this large page case too. So at this moment, this
> patch should not have any issues :)

The map-operation actually takes a size, as it should. The idea is to
change this function to a map_page interface which takes a page-size
parameter, but thats another story.
The IOTLB flushing is not exposed by the IOMMU-API anyway. To whatever
is necessary to do that, it is the business of the IOMMU driver. So in
the unmap-path the driver finds out the page-size to unmap and can
immediatly flush the IOTLB for that page.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
  2011-11-10  6:17     ` Kai Huang
@ 2011-11-10 15:28       ` David Woodhouse
  -1 siblings, 0 replies; 81+ messages in thread
From: David Woodhouse @ 2011-11-10 15:28 UTC (permalink / raw)
  To: Kai Huang
  Cc: Ohad Ben-Cohen, iommu, linux-omap, Laurent Pinchart,
	Joerg Roedel, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho,
	kvm

On Thu, 2011-11-10 at 14:17 +0800, Kai Huang wrote:
> And another question: have we considered the IOTLB flush operation? I
> think we need to implement similar logic when flush the DVMA range.
> Intel VT-d's manual says software needs to specify the appropriate
> mask value to flush large pages, but it does not say we need to
> exactly match the page size as it is mapped. I guess it's not
> necessary for Intel IOMMU, but other vendor's IOMMU may have such
> limitation (or some other limitations). In my understanding current
> implementation does not provide page size information for particular
> DVMA ranges that has been mapped, and it's not flexible to implement
> IOTLB flush code (ex, we may need to walk through page table to find
> out actual page size). Maybe we can also add iommu_ops->flush_iotlb ? 

Which brings me to another question I have been pondering... do we even
have a consensus on exactly *when* the IOTLB should be flushed?

Even just for the Intel IOMMU, we have three different behaviours:

  - For DMA API users by default, we do 'batched unmap', so a mapping
    may be active for a period of time after the driver has requested
    that it be unmapped.

  - ... unless booted with 'intel_iommu=strict', in which case we do the
    unmap and IOTLB flush immediately before returning to the driver.

  - But the IOMMU API for virtualisation is different. In fact that
    doesn't seem to flush the IOTLB at all. Which is probably a bug.

What is acceptable, though? That batched unmap is quite important for
performance, because it means that we don't have to bash on the hardware
and wait for a flush to complete in the fast path of network driver RX,
for example.

If we move to a model where we have a separate ->flush_iotlb() call, we
need to be careful that we still allow necessary optimisations to
happen.

Since I have the right people on Cc and the iommu list is still down,
and it's vaguely tangentially related...

I'm looking at fixing performance issues in the Intel IOMMU code, with
its virtual address space allocation (the rbtree-based one in iova.c
that nobody else uses, which has a single spinlock that *all* CPUs bash
on when they need to allocate).

The plan is, vaguely, to allocate large chunks of space to each CPU, and
then for each CPU to allocate from its own region first, thus ensuring
that the common case doesn't bounce locks between CPUs. It'll be rare
for one CPU to have to touch a subregion 'belonging' to another CPU, so
lock contention should be drastically reduced.

Should I be planning to drop the DMA API support from intel-iommu.c
completely, and have the new allocator just call into the IOMMU API
functions instead? Other people have been looking at that, haven't they?
Is there any code? Or special platform-specific requirements for such a
generic wrapper that I might not have thought of? Details about when to
flush the IOTLB are one such thing which might need special handling for
certain hardware...

-- 
dwmw2

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-11-10 15:28       ` David Woodhouse
  0 siblings, 0 replies; 81+ messages in thread
From: David Woodhouse @ 2011-11-10 15:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2011-11-10 at 14:17 +0800, Kai Huang wrote:
> And another question: have we considered the IOTLB flush operation? I
> think we need to implement similar logic when flush the DVMA range.
> Intel VT-d's manual says software needs to specify the appropriate
> mask value to flush large pages, but it does not say we need to
> exactly match the page size as it is mapped. I guess it's not
> necessary for Intel IOMMU, but other vendor's IOMMU may have such
> limitation (or some other limitations). In my understanding current
> implementation does not provide page size information for particular
> DVMA ranges that has been mapped, and it's not flexible to implement
> IOTLB flush code (ex, we may need to walk through page table to find
> out actual page size). Maybe we can also add iommu_ops->flush_iotlb ? 

Which brings me to another question I have been pondering... do we even
have a consensus on exactly *when* the IOTLB should be flushed?

Even just for the Intel IOMMU, we have three different behaviours:

  - For DMA API users by default, we do 'batched unmap', so a mapping
    may be active for a period of time after the driver has requested
    that it be unmapped.

  - ... unless booted with 'intel_iommu=strict', in which case we do the
    unmap and IOTLB flush immediately before returning to the driver.

  - But the IOMMU API for virtualisation is different. In fact that
    doesn't seem to flush the IOTLB at all. Which is probably a bug.

What is acceptable, though? That batched unmap is quite important for
performance, because it means that we don't have to bash on the hardware
and wait for a flush to complete in the fast path of network driver RX,
for example.

If we move to a model where we have a separate ->flush_iotlb() call, we
need to be careful that we still allow necessary optimisations to
happen.

Since I have the right people on Cc and the iommu list is still down,
and it's vaguely tangentially related...

I'm looking at fixing performance issues in the Intel IOMMU code, with
its virtual address space allocation (the rbtree-based one in iova.c
that nobody else uses, which has a single spinlock that *all* CPUs bash
on when they need to allocate).

The plan is, vaguely, to allocate large chunks of space to each CPU, and
then for each CPU to allocate from its own region first, thus ensuring
that the common case doesn't bounce locks between CPUs. It'll be rare
for one CPU to have to touch a subregion 'belonging' to another CPU, so
lock contention should be drastically reduced.

Should I be planning to drop the DMA API support from intel-iommu.c
completely, and have the new allocator just call into the IOMMU API
functions instead? Other people have been looking at that, haven't they?
Is there any code? Or special platform-specific requirements for such a
generic wrapper that I might not have thought of? Details about when to
flush the IOTLB are one such thing which might need special handling for
certain hardware...

-- 
dwmw2

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
  2011-11-10 15:28       ` David Woodhouse
  (?)
@ 2011-11-10 17:09         ` Joerg Roedel
  -1 siblings, 0 replies; 81+ messages in thread
From: Joerg Roedel @ 2011-11-10 17:09 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Kai Huang, Ohad Ben-Cohen, iommu, linux-omap, Laurent Pinchart,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho, kvm

On Thu, Nov 10, 2011 at 03:28:50PM +0000, David Woodhouse wrote:

> Which brings me to another question I have been pondering... do we even
> have a consensus on exactly *when* the IOTLB should be flushed?

Well, sort of, there is still the outstanding idea of the
iommu_commit() interface for the IOMMU-API.

> Even just for the Intel IOMMU, we have three different behaviours:
> 
>   - For DMA API users by default, we do 'batched unmap', so a mapping
>     may be active for a period of time after the driver has requested
>     that it be unmapped.

The requirement for the DMA-API is, that the IOTLB must be consistent
with existing mappings, and only with the parts that are really mapped.
The unmapped parts are not important.

This allows nice optimizations like your 'batched unmap' on the Intel
IOMMU driver side. The AMD IOMMU driver uses a round-robin bitmap
allocator for the IO addresses which makes it very easy to flush certain
IOTLB ranges only before they are reused.

>   - ... unless booted with 'intel_iommu=strict', in which case we do the
>     unmap and IOTLB flush immediately before returning to the driver.

There is something similar on the AMD IOMMU side. There it is called
unmap_flush.

>   - But the IOMMU API for virtualisation is different. In fact that
>     doesn't seem to flush the IOTLB at all. Which is probably a bug.

Well, *current* requirement is, that the IOTLB is in sync with the
page-table at every time. This is true for the iommu_map and especially
for the iommu_unmap function. It means basically that the unmapped area
needs to be flushed out of the IOTLBs before iommu_unmap returns.

Some time ago I proposed the iommu_commit() interface which changes
these requirements. With this interface the requirement is that after a
couple of map/unmap operations the IOMMU-API user has to call
iommu_commit() to make these changes visible to the hardware (so mostly
sync the IOTLBs). As discussed at that time this would make sense for
the Intel and AMD IOMMU drivers.

> What is acceptable, though? That batched unmap is quite important for
> performance, because it means that we don't have to bash on the hardware
> and wait for a flush to complete in the fast path of network driver RX,
> for example.

Have you considered a round-robin bitmap-allocator? It allows quite nice
flushing behavior.

> If we move to a model where we have a separate ->flush_iotlb() call, we
> need to be careful that we still allow necessary optimisations to
> happen.

With iommu_commit() this should be possible, still.

> I'm looking at fixing performance issues in the Intel IOMMU code, with
> its virtual address space allocation (the rbtree-based one in iova.c
> that nobody else uses, which has a single spinlock that *all* CPUs bash
> on when they need to allocate).
> 
> The plan is, vaguely, to allocate large chunks of space to each CPU, and
> then for each CPU to allocate from its own region first, thus ensuring
> that the common case doesn't bounce locks between CPUs. It'll be rare
> for one CPU to have to touch a subregion 'belonging' to another CPU, so
> lock contention should be drastically reduced.

Thats an interesting issue. It exists on the AMD IOMMU side too, the
bitmap-allocator runs in a per-domain spinlock which can get high
contention. I am not sure how per-cpu chunks of the address space scale
to large numbers of cpus, though, given that some devices only have a
small address range that they can address.

I have been thinking about some lockless algorithms for the
bitmap-allocator. But the ideas are not finalized yet, so I still don't
know if they will work out at all :)
The basic idea builds around the fact, that most allocations using the
DMA-API fit into one page. So probably we can split the address-space
into a region for one-page allocations which can be accessed without
locks and another region for larger allocations which still need locks.

> Should I be planning to drop the DMA API support from intel-iommu.c
> completely, and have the new allocator just call into the IOMMU API
> functions instead? Other people have been looking at that, haven't they?

Yes, Marek Szyprowski from the ARM side is looking into this already,
but his patches are very ARM specific and not suitable for x86 yet.

> Is there any code? Or special platform-specific requirements for such a
> generic wrapper that I might not have thought of? Details about when to
> flush the IOTLB are one such thing which might need special handling for
> certain hardware...

The plan is to have a single DMA-API implementation for all IOMMU
drivers (X86 and ARM) which just uses the IOMMU-API. But to make this
performing reasonalbly well a few changes to the IOMMU-API are required.
I already have some ideas which we can discuss if you want.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-11-10 17:09         ` Joerg Roedel
  0 siblings, 0 replies; 81+ messages in thread
From: Joerg Roedel @ 2011-11-10 17:09 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Ohad Ben-Cohen, KyongHo Cho, Kai Huang, kvm, Arnd Bergmann,
	linux-kernel, iommu, Laurent Pinchart, David Brown, linux-omap,
	Stepan Moskovchenko, linux-arm-kernel, Hiroshi Doyu

On Thu, Nov 10, 2011 at 03:28:50PM +0000, David Woodhouse wrote:

> Which brings me to another question I have been pondering... do we even
> have a consensus on exactly *when* the IOTLB should be flushed?

Well, sort of, there is still the outstanding idea of the
iommu_commit() interface for the IOMMU-API.

> Even just for the Intel IOMMU, we have three different behaviours:
> 
>   - For DMA API users by default, we do 'batched unmap', so a mapping
>     may be active for a period of time after the driver has requested
>     that it be unmapped.

The requirement for the DMA-API is, that the IOTLB must be consistent
with existing mappings, and only with the parts that are really mapped.
The unmapped parts are not important.

This allows nice optimizations like your 'batched unmap' on the Intel
IOMMU driver side. The AMD IOMMU driver uses a round-robin bitmap
allocator for the IO addresses which makes it very easy to flush certain
IOTLB ranges only before they are reused.

>   - ... unless booted with 'intel_iommu=strict', in which case we do the
>     unmap and IOTLB flush immediately before returning to the driver.

There is something similar on the AMD IOMMU side. There it is called
unmap_flush.

>   - But the IOMMU API for virtualisation is different. In fact that
>     doesn't seem to flush the IOTLB at all. Which is probably a bug.

Well, *current* requirement is, that the IOTLB is in sync with the
page-table at every time. This is true for the iommu_map and especially
for the iommu_unmap function. It means basically that the unmapped area
needs to be flushed out of the IOTLBs before iommu_unmap returns.

Some time ago I proposed the iommu_commit() interface which changes
these requirements. With this interface the requirement is that after a
couple of map/unmap operations the IOMMU-API user has to call
iommu_commit() to make these changes visible to the hardware (so mostly
sync the IOTLBs). As discussed at that time this would make sense for
the Intel and AMD IOMMU drivers.

> What is acceptable, though? That batched unmap is quite important for
> performance, because it means that we don't have to bash on the hardware
> and wait for a flush to complete in the fast path of network driver RX,
> for example.

Have you considered a round-robin bitmap-allocator? It allows quite nice
flushing behavior.

> If we move to a model where we have a separate ->flush_iotlb() call, we
> need to be careful that we still allow necessary optimisations to
> happen.

With iommu_commit() this should be possible, still.

> I'm looking at fixing performance issues in the Intel IOMMU code, with
> its virtual address space allocation (the rbtree-based one in iova.c
> that nobody else uses, which has a single spinlock that *all* CPUs bash
> on when they need to allocate).
> 
> The plan is, vaguely, to allocate large chunks of space to each CPU, and
> then for each CPU to allocate from its own region first, thus ensuring
> that the common case doesn't bounce locks between CPUs. It'll be rare
> for one CPU to have to touch a subregion 'belonging' to another CPU, so
> lock contention should be drastically reduced.

Thats an interesting issue. It exists on the AMD IOMMU side too, the
bitmap-allocator runs in a per-domain spinlock which can get high
contention. I am not sure how per-cpu chunks of the address space scale
to large numbers of cpus, though, given that some devices only have a
small address range that they can address.

I have been thinking about some lockless algorithms for the
bitmap-allocator. But the ideas are not finalized yet, so I still don't
know if they will work out at all :)
The basic idea builds around the fact, that most allocations using the
DMA-API fit into one page. So probably we can split the address-space
into a region for one-page allocations which can be accessed without
locks and another region for larger allocations which still need locks.

> Should I be planning to drop the DMA API support from intel-iommu.c
> completely, and have the new allocator just call into the IOMMU API
> functions instead? Other people have been looking at that, haven't they?

Yes, Marek Szyprowski from the ARM side is looking into this already,
but his patches are very ARM specific and not suitable for x86 yet.

> Is there any code? Or special platform-specific requirements for such a
> generic wrapper that I might not have thought of? Details about when to
> flush the IOTLB are one such thing which might need special handling for
> certain hardware...

The plan is to have a single DMA-API implementation for all IOMMU
drivers (X86 and ARM) which just uses the IOMMU-API. But to make this
performing reasonalbly well a few changes to the IOMMU-API are required.
I already have some ideas which we can discuss if you want.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-11-10 17:09         ` Joerg Roedel
  0 siblings, 0 replies; 81+ messages in thread
From: Joerg Roedel @ 2011-11-10 17:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 10, 2011 at 03:28:50PM +0000, David Woodhouse wrote:

> Which brings me to another question I have been pondering... do we even
> have a consensus on exactly *when* the IOTLB should be flushed?

Well, sort of, there is still the outstanding idea of the
iommu_commit() interface for the IOMMU-API.

> Even just for the Intel IOMMU, we have three different behaviours:
> 
>   - For DMA API users by default, we do 'batched unmap', so a mapping
>     may be active for a period of time after the driver has requested
>     that it be unmapped.

The requirement for the DMA-API is, that the IOTLB must be consistent
with existing mappings, and only with the parts that are really mapped.
The unmapped parts are not important.

This allows nice optimizations like your 'batched unmap' on the Intel
IOMMU driver side. The AMD IOMMU driver uses a round-robin bitmap
allocator for the IO addresses which makes it very easy to flush certain
IOTLB ranges only before they are reused.

>   - ... unless booted with 'intel_iommu=strict', in which case we do the
>     unmap and IOTLB flush immediately before returning to the driver.

There is something similar on the AMD IOMMU side. There it is called
unmap_flush.

>   - But the IOMMU API for virtualisation is different. In fact that
>     doesn't seem to flush the IOTLB at all. Which is probably a bug.

Well, *current* requirement is, that the IOTLB is in sync with the
page-table at every time. This is true for the iommu_map and especially
for the iommu_unmap function. It means basically that the unmapped area
needs to be flushed out of the IOTLBs before iommu_unmap returns.

Some time ago I proposed the iommu_commit() interface which changes
these requirements. With this interface the requirement is that after a
couple of map/unmap operations the IOMMU-API user has to call
iommu_commit() to make these changes visible to the hardware (so mostly
sync the IOTLBs). As discussed at that time this would make sense for
the Intel and AMD IOMMU drivers.

> What is acceptable, though? That batched unmap is quite important for
> performance, because it means that we don't have to bash on the hardware
> and wait for a flush to complete in the fast path of network driver RX,
> for example.

Have you considered a round-robin bitmap-allocator? It allows quite nice
flushing behavior.

> If we move to a model where we have a separate ->flush_iotlb() call, we
> need to be careful that we still allow necessary optimisations to
> happen.

With iommu_commit() this should be possible, still.

> I'm looking at fixing performance issues in the Intel IOMMU code, with
> its virtual address space allocation (the rbtree-based one in iova.c
> that nobody else uses, which has a single spinlock that *all* CPUs bash
> on when they need to allocate).
> 
> The plan is, vaguely, to allocate large chunks of space to each CPU, and
> then for each CPU to allocate from its own region first, thus ensuring
> that the common case doesn't bounce locks between CPUs. It'll be rare
> for one CPU to have to touch a subregion 'belonging' to another CPU, so
> lock contention should be drastically reduced.

Thats an interesting issue. It exists on the AMD IOMMU side too, the
bitmap-allocator runs in a per-domain spinlock which can get high
contention. I am not sure how per-cpu chunks of the address space scale
to large numbers of cpus, though, given that some devices only have a
small address range that they can address.

I have been thinking about some lockless algorithms for the
bitmap-allocator. But the ideas are not finalized yet, so I still don't
know if they will work out at all :)
The basic idea builds around the fact, that most allocations using the
DMA-API fit into one page. So probably we can split the address-space
into a region for one-page allocations which can be accessed without
locks and another region for larger allocations which still need locks.

> Should I be planning to drop the DMA API support from intel-iommu.c
> completely, and have the new allocator just call into the IOMMU API
> functions instead? Other people have been looking at that, haven't they?

Yes, Marek Szyprowski from the ARM side is looking into this already,
but his patches are very ARM specific and not suitable for x86 yet.

> Is there any code? Or special platform-specific requirements for such a
> generic wrapper that I might not have thought of? Details about when to
> flush the IOTLB are one such thing which might need special handling for
> certain hardware...

The plan is to have a single DMA-API implementation for all IOMMU
drivers (X86 and ARM) which just uses the IOMMU-API. But to make this
performing reasonalbly well a few changes to the IOMMU-API are required.
I already have some ideas which we can discuss if you want.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
  2011-11-10 17:09         ` Joerg Roedel
@ 2011-11-10 19:28           ` David Woodhouse
  -1 siblings, 0 replies; 81+ messages in thread
From: David Woodhouse @ 2011-11-10 19:28 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Kai Huang, Ohad Ben-Cohen, iommu, linux-omap, Laurent Pinchart,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho, kvm

On Thu, 2011-11-10 at 18:09 +0100, Joerg Roedel wrote:
> The requirement for the DMA-API is, that the IOTLB must be consistent
> with existing mappings, and only with the parts that are really mapped.
> The unmapped parts are not important.
> 
> This allows nice optimizations like your 'batched unmap' on the Intel
> IOMMU driver side. The AMD IOMMU driver uses a round-robin bitmap
> allocator for the IO addresses which makes it very easy to flush certain
> IOTLB ranges only before they are reused.

... which implies that a mapping, once made, might *never* actually get
torn down until we loop and start reusing address space? That has
interesting security implications. Is it true even for devices which
have been assigned to a VM and then unassigned?

> >   - ... unless booted with 'intel_iommu=strict', in which case we do the
> >     unmap and IOTLB flush immediately before returning to the driver.
> 
> There is something similar on the AMD IOMMU side. There it is called
> unmap_flush.

OK, so that definitely wants consolidating into a generic option.

> >   - But the IOMMU API for virtualisation is different. In fact that
> >     doesn't seem to flush the IOTLB at all. Which is probably a bug.
> 
> Well, *current* requirement is, that the IOTLB is in sync with the
> page-table at every time. This is true for the iommu_map and especially
> for the iommu_unmap function. It means basically that the unmapped area
> needs to be flushed out of the IOTLBs before iommu_unmap returns.
> 
> Some time ago I proposed the iommu_commit() interface which changes
> these requirements. With this interface the requirement is that after a
> couple of map/unmap operations the IOMMU-API user has to call
> iommu_commit() to make these changes visible to the hardware (so mostly
> sync the IOTLBs). As discussed at that time this would make sense for
> the Intel and AMD IOMMU drivers.

I would *really* want to keep those off the fast path (thinking mostly
about DMA API here, since that's the performance issue). But as long as
we can achieve that, that's fine.

> > What is acceptable, though? That batched unmap is quite important for
> > performance, because it means that we don't have to bash on the hardware
> > and wait for a flush to complete in the fast path of network driver RX,
> > for example.
> 
> Have you considered a round-robin bitmap-allocator? It allows quite nice
> flushing behavior.

Yeah, I was just looking through the AMD code with a view to doing
something similar. I was thinking of using that technique *within* each
larger range allocated from the whole address space.

> > If we move to a model where we have a separate ->flush_iotlb() call, we
> > need to be careful that we still allow necessary optimisations to
> > happen.
> 
> With iommu_commit() this should be possible, still.
> 
> > I'm looking at fixing performance issues in the Intel IOMMU code, with
> > its virtual address space allocation (the rbtree-based one in iova.c
> > that nobody else uses, which has a single spinlock that *all* CPUs bash
> > on when they need to allocate).
> > 
> > The plan is, vaguely, to allocate large chunks of space to each CPU, and
> > then for each CPU to allocate from its own region first, thus ensuring
> > that the common case doesn't bounce locks between CPUs. It'll be rare
> > for one CPU to have to touch a subregion 'belonging' to another CPU, so
> > lock contention should be drastically reduced.
> 
> Thats an interesting issue. It exists on the AMD IOMMU side too, the
> bitmap-allocator runs in a per-domain spinlock which can get high
> contention. I am not sure how per-cpu chunks of the address space scale
> to large numbers of cpus, though, given that some devices only have a
> small address range that they can address.

I don't care about performance of broken hardware. If we have a single
*global* "subrange" for the <4GiB range of address space, that's
absolutely fine by me.

But also, it's not *so* much of an issue to divide the space up even
when it's limited. The idea was not to have it *strictly* per-CPU, but
just for a CPU to try allocating from "its own" subrange first, and then
fall back to allocating a new subrange, and *then* fall back to
allocating from subranges "belonging" to other CPUs. It's not that the
allocation from a subrange would be lockless — it's that the lock would
almost never leave the l1 cache of the CPU that *normally* uses that
subrange.

With batched unmaps, the CPU doing the unmap may end up taking the lock
occasionally, and bounce cache lines then. But it's infrequent enough
that it shouldn't be a performance problem.

> I have been thinking about some lockless algorithms for the
> bitmap-allocator. But the ideas are not finalized yet, so I still don't
> know if they will work out at all :)

As explained above, I wasn't going for lockless. I was going for
lock-contention-less. Or at least mostly :)

Do you think that approach sounds reasonable?

> The plan is to have a single DMA-API implementation for all IOMMU
> drivers (X86 and ARM) which just uses the IOMMU-API. But to make this
> performing reasonalbly well a few changes to the IOMMU-API are required.
> I already have some ideas which we can discuss if you want.

Yeah, that sounds useful.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-11-10 19:28           ` David Woodhouse
  0 siblings, 0 replies; 81+ messages in thread
From: David Woodhouse @ 2011-11-10 19:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2011-11-10 at 18:09 +0100, Joerg Roedel wrote:
> The requirement for the DMA-API is, that the IOTLB must be consistent
> with existing mappings, and only with the parts that are really mapped.
> The unmapped parts are not important.
> 
> This allows nice optimizations like your 'batched unmap' on the Intel
> IOMMU driver side. The AMD IOMMU driver uses a round-robin bitmap
> allocator for the IO addresses which makes it very easy to flush certain
> IOTLB ranges only before they are reused.

... which implies that a mapping, once made, might *never* actually get
torn down until we loop and start reusing address space? That has
interesting security implications. Is it true even for devices which
have been assigned to a VM and then unassigned?

> >   - ... unless booted with 'intel_iommu=strict', in which case we do the
> >     unmap and IOTLB flush immediately before returning to the driver.
> 
> There is something similar on the AMD IOMMU side. There it is called
> unmap_flush.

OK, so that definitely wants consolidating into a generic option.

> >   - But the IOMMU API for virtualisation is different. In fact that
> >     doesn't seem to flush the IOTLB at all. Which is probably a bug.
> 
> Well, *current* requirement is, that the IOTLB is in sync with the
> page-table at every time. This is true for the iommu_map and especially
> for the iommu_unmap function. It means basically that the unmapped area
> needs to be flushed out of the IOTLBs before iommu_unmap returns.
> 
> Some time ago I proposed the iommu_commit() interface which changes
> these requirements. With this interface the requirement is that after a
> couple of map/unmap operations the IOMMU-API user has to call
> iommu_commit() to make these changes visible to the hardware (so mostly
> sync the IOTLBs). As discussed at that time this would make sense for
> the Intel and AMD IOMMU drivers.

I would *really* want to keep those off the fast path (thinking mostly
about DMA API here, since that's the performance issue). But as long as
we can achieve that, that's fine.

> > What is acceptable, though? That batched unmap is quite important for
> > performance, because it means that we don't have to bash on the hardware
> > and wait for a flush to complete in the fast path of network driver RX,
> > for example.
> 
> Have you considered a round-robin bitmap-allocator? It allows quite nice
> flushing behavior.

Yeah, I was just looking through the AMD code with a view to doing
something similar. I was thinking of using that technique *within* each
larger range allocated from the whole address space.

> > If we move to a model where we have a separate ->flush_iotlb() call, we
> > need to be careful that we still allow necessary optimisations to
> > happen.
> 
> With iommu_commit() this should be possible, still.
> 
> > I'm looking at fixing performance issues in the Intel IOMMU code, with
> > its virtual address space allocation (the rbtree-based one in iova.c
> > that nobody else uses, which has a single spinlock that *all* CPUs bash
> > on when they need to allocate).
> > 
> > The plan is, vaguely, to allocate large chunks of space to each CPU, and
> > then for each CPU to allocate from its own region first, thus ensuring
> > that the common case doesn't bounce locks between CPUs. It'll be rare
> > for one CPU to have to touch a subregion 'belonging' to another CPU, so
> > lock contention should be drastically reduced.
> 
> Thats an interesting issue. It exists on the AMD IOMMU side too, the
> bitmap-allocator runs in a per-domain spinlock which can get high
> contention. I am not sure how per-cpu chunks of the address space scale
> to large numbers of cpus, though, given that some devices only have a
> small address range that they can address.

I don't care about performance of broken hardware. If we have a single
*global* "subrange" for the <4GiB range of address space, that's
absolutely fine by me.

But also, it's not *so* much of an issue to divide the space up even
when it's limited. The idea was not to have it *strictly* per-CPU, but
just for a CPU to try allocating from "its own" subrange first, and then
fall back to allocating a new subrange, and *then* fall back to
allocating from subranges "belonging" to other CPUs. It's not that the
allocation from a subrange would be lockless ? it's that the lock would
almost never leave the l1 cache of the CPU that *normally* uses that
subrange.

With batched unmaps, the CPU doing the unmap may end up taking the lock
occasionally, and bounce cache lines then. But it's infrequent enough
that it shouldn't be a performance problem.

> I have been thinking about some lockless algorithms for the
> bitmap-allocator. But the ideas are not finalized yet, so I still don't
> know if they will work out at all :)

As explained above, I wasn't going for lockless. I was going for
lock-contention-less. Or at least mostly :)

Do you think that approach sounds reasonable?

> The plan is to have a single DMA-API implementation for all IOMMU
> drivers (X86 and ARM) which just uses the IOMMU-API. But to make this
> performing reasonalbly well a few changes to the IOMMU-API are required.
> I already have some ideas which we can discuss if you want.

Yeah, that sounds useful.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
  2011-11-10 17:09         ` Joerg Roedel
@ 2011-11-10 21:12           ` Stepan Moskovchenko
  -1 siblings, 0 replies; 81+ messages in thread
From: Stepan Moskovchenko @ 2011-11-10 21:12 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: David Woodhouse, Kai Huang, Ohad Ben-Cohen, iommu, linux-omap,
	Laurent Pinchart, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Hiroshi Doyu, KyongHo Cho, kvm

On 11/10/2011 9:09 AM, Joerg Roedel wrote:
> The plan is to have a single DMA-API implementation for all IOMMU 
> drivers (X86 and ARM) which just uses the IOMMU-API. But to make this 
> performing reasonalbly well a few changes to the IOMMU-API are 
> required. I already have some ideas which we can discuss if you want.

I have been experimenting with an iommu_map_range call, which maps a 
given scatterlist of discontiguous physical pages into a contiguous 
virtual region at a given IOVA. This has some performance advantages 
over just calling iommu_map iteratively. First, it reduces the amount of 
table walking / calculation needed for mapping each page, given how you 
know that all the pages will be mapped into a single 
virtually-contiguous region (so in most cases, the first-level table 
calculation can be reused). Second, it allows one to defer the TLB (and 
sometimes cache) maintenance operations until the entire scatterlist has 
been mapped, rather than doing a TLB invalidate after mapping each page, 
as would have been the case if iommu_map were just being called from 
within a loop. Granted, just using iommu_map many times may be 
acceptable on the slow path, but I have seen significant performance 
gains when using this approach on the fast path.

Steve

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-11-10 21:12           ` Stepan Moskovchenko
  0 siblings, 0 replies; 81+ messages in thread
From: Stepan Moskovchenko @ 2011-11-10 21:12 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/10/2011 9:09 AM, Joerg Roedel wrote:
> The plan is to have a single DMA-API implementation for all IOMMU 
> drivers (X86 and ARM) which just uses the IOMMU-API. But to make this 
> performing reasonalbly well a few changes to the IOMMU-API are 
> required. I already have some ideas which we can discuss if you want.

I have been experimenting with an iommu_map_range call, which maps a 
given scatterlist of discontiguous physical pages into a contiguous 
virtual region at a given IOVA. This has some performance advantages 
over just calling iommu_map iteratively. First, it reduces the amount of 
table walking / calculation needed for mapping each page, given how you 
know that all the pages will be mapped into a single 
virtually-contiguous region (so in most cases, the first-level table 
calculation can be reused). Second, it allows one to defer the TLB (and 
sometimes cache) maintenance operations until the entire scatterlist has 
been mapped, rather than doing a TLB invalidate after mapping each page, 
as would have been the case if iommu_map were just being called from 
within a loop. Granted, just using iommu_map many times may be 
acceptable on the slow path, but I have seen significant performance 
gains when using this approach on the fast path.

Steve

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
  2011-11-10 19:28           ` David Woodhouse
  (?)
@ 2011-11-11 12:58             ` Joerg Roedel
  -1 siblings, 0 replies; 81+ messages in thread
From: Joerg Roedel @ 2011-11-11 12:58 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Kai Huang, Ohad Ben-Cohen, iommu, linux-omap, Laurent Pinchart,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho, kvm

On Thu, Nov 10, 2011 at 07:28:39PM +0000, David Woodhouse wrote:

> ... which implies that a mapping, once made, might *never* actually get
> torn down until we loop and start reusing address space? That has
> interesting security implications.

Yes, it is a trade-off between security and performance. But if the user
wants more security the unmap_flush parameter can be used.

> Is it true even for devices which have been assigned to a VM and then
> unassigned?

No, this is only used in the DMA-API path. The device-assignment code
uses the IOMMU-API directly. There the IOTLB is always flushed on unmap.

> > There is something similar on the AMD IOMMU side. There it is called
> > unmap_flush.
> 
> OK, so that definitely wants consolidating into a generic option.

Agreed.

> > Some time ago I proposed the iommu_commit() interface which changes
> > these requirements. With this interface the requirement is that after a
> > couple of map/unmap operations the IOMMU-API user has to call
> > iommu_commit() to make these changes visible to the hardware (so mostly
> > sync the IOTLBs). As discussed at that time this would make sense for
> > the Intel and AMD IOMMU drivers.
> 
> I would *really* want to keep those off the fast path (thinking mostly
> about DMA API here, since that's the performance issue). But as long as
> we can achieve that, that's fine.

For AMD IOMMU there is a feature called not-present cache. It says that
the IOMMU caches non-present entries as well and needs an IOTLB flush
when something is mapped (meant for software implementations of the
IOMMU).
So it can't be really taken out of the fast-path. But the IOMMU driver
can optimize the function so that it only flushes the IOTLB when there
was an unmap-call before. It is also an improvement over the current
situation where every iommu_unmap call results in a flush implicitly.
This pretty much a no-go for using IOMMU-API in DMA mapping at the
moment.

> But also, it's not *so* much of an issue to divide the space up even
> when it's limited. The idea was not to have it *strictly* per-CPU, but
> just for a CPU to try allocating from "its own" subrange first, and then
> fall back to allocating a new subrange, and *then* fall back to
> allocating from subranges "belonging" to other CPUs. It's not that the
> allocation from a subrange would be lockless — it's that the lock would
> almost never leave the l1 cache of the CPU that *normally* uses that
> subrange.

Yeah, I get the idea. I fear that the memory consumption will get pretty
high with that approach. It basically means one round-robin allocator
per cpu and device. What does that mean on a 4096 CPU machine :)
How much lock contention will be lowered also depends on the work-load.
If dma-handles are frequently freed from another cpu than they were
allocated from the same problem re-appears.
But in the end we have to try it out and see what works best :)

Regards,

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-11-11 12:58             ` Joerg Roedel
  0 siblings, 0 replies; 81+ messages in thread
From: Joerg Roedel @ 2011-11-11 12:58 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Kai Huang, Ohad Ben-Cohen, iommu, linux-omap, Laurent Pinchart,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho, kvm

On Thu, Nov 10, 2011 at 07:28:39PM +0000, David Woodhouse wrote:

> ... which implies that a mapping, once made, might *never* actually get
> torn down until we loop and start reusing address space? That has
> interesting security implications.

Yes, it is a trade-off between security and performance. But if the user
wants more security the unmap_flush parameter can be used.

> Is it true even for devices which have been assigned to a VM and then
> unassigned?

No, this is only used in the DMA-API path. The device-assignment code
uses the IOMMU-API directly. There the IOTLB is always flushed on unmap.

> > There is something similar on the AMD IOMMU side. There it is called
> > unmap_flush.
> 
> OK, so that definitely wants consolidating into a generic option.

Agreed.

> > Some time ago I proposed the iommu_commit() interface which changes
> > these requirements. With this interface the requirement is that after a
> > couple of map/unmap operations the IOMMU-API user has to call
> > iommu_commit() to make these changes visible to the hardware (so mostly
> > sync the IOTLBs). As discussed at that time this would make sense for
> > the Intel and AMD IOMMU drivers.
> 
> I would *really* want to keep those off the fast path (thinking mostly
> about DMA API here, since that's the performance issue). But as long as
> we can achieve that, that's fine.

For AMD IOMMU there is a feature called not-present cache. It says that
the IOMMU caches non-present entries as well and needs an IOTLB flush
when something is mapped (meant for software implementations of the
IOMMU).
So it can't be really taken out of the fast-path. But the IOMMU driver
can optimize the function so that it only flushes the IOTLB when there
was an unmap-call before. It is also an improvement over the current
situation where every iommu_unmap call results in a flush implicitly.
This pretty much a no-go for using IOMMU-API in DMA mapping at the
moment.

> But also, it's not *so* much of an issue to divide the space up even
> when it's limited. The idea was not to have it *strictly* per-CPU, but
> just for a CPU to try allocating from "its own" subrange first, and then
> fall back to allocating a new subrange, and *then* fall back to
> allocating from subranges "belonging" to other CPUs. It's not that the
> allocation from a subrange would be lockless — it's that the lock would
> almost never leave the l1 cache of the CPU that *normally* uses that
> subrange.

Yeah, I get the idea. I fear that the memory consumption will get pretty
high with that approach. It basically means one round-robin allocator
per cpu and device. What does that mean on a 4096 CPU machine :)
How much lock contention will be lowered also depends on the work-load.
If dma-handles are frequently freed from another cpu than they were
allocated from the same problem re-appears.
But in the end we have to try it out and see what works best :)

Regards,

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-11-11 12:58             ` Joerg Roedel
  0 siblings, 0 replies; 81+ messages in thread
From: Joerg Roedel @ 2011-11-11 12:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 10, 2011 at 07:28:39PM +0000, David Woodhouse wrote:

> ... which implies that a mapping, once made, might *never* actually get
> torn down until we loop and start reusing address space? That has
> interesting security implications.

Yes, it is a trade-off between security and performance. But if the user
wants more security the unmap_flush parameter can be used.

> Is it true even for devices which have been assigned to a VM and then
> unassigned?

No, this is only used in the DMA-API path. The device-assignment code
uses the IOMMU-API directly. There the IOTLB is always flushed on unmap.

> > There is something similar on the AMD IOMMU side. There it is called
> > unmap_flush.
> 
> OK, so that definitely wants consolidating into a generic option.

Agreed.

> > Some time ago I proposed the iommu_commit() interface which changes
> > these requirements. With this interface the requirement is that after a
> > couple of map/unmap operations the IOMMU-API user has to call
> > iommu_commit() to make these changes visible to the hardware (so mostly
> > sync the IOTLBs). As discussed at that time this would make sense for
> > the Intel and AMD IOMMU drivers.
> 
> I would *really* want to keep those off the fast path (thinking mostly
> about DMA API here, since that's the performance issue). But as long as
> we can achieve that, that's fine.

For AMD IOMMU there is a feature called not-present cache. It says that
the IOMMU caches non-present entries as well and needs an IOTLB flush
when something is mapped (meant for software implementations of the
IOMMU).
So it can't be really taken out of the fast-path. But the IOMMU driver
can optimize the function so that it only flushes the IOTLB when there
was an unmap-call before. It is also an improvement over the current
situation where every iommu_unmap call results in a flush implicitly.
This pretty much a no-go for using IOMMU-API in DMA mapping at the
moment.

> But also, it's not *so* much of an issue to divide the space up even
> when it's limited. The idea was not to have it *strictly* per-CPU, but
> just for a CPU to try allocating from "its own" subrange first, and then
> fall back to allocating a new subrange, and *then* fall back to
> allocating from subranges "belonging" to other CPUs. It's not that the
> allocation from a subrange would be lockless ? it's that the lock would
> almost never leave the l1 cache of the CPU that *normally* uses that
> subrange.

Yeah, I get the idea. I fear that the memory consumption will get pretty
high with that approach. It basically means one round-robin allocator
per cpu and device. What does that mean on a 4096 CPU machine :)
How much lock contention will be lowered also depends on the work-load.
If dma-handles are frequently freed from another cpu than they were
allocated from the same problem re-appears.
But in the end we have to try it out and see what works best :)

Regards,

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Changing IOMMU-API for generic DMA-mapping supported by the hardware
  2011-11-10 19:28           ` David Woodhouse
@ 2011-11-11 13:17             ` Joerg Roedel
  -1 siblings, 0 replies; 81+ messages in thread
From: Joerg Roedel @ 2011-11-11 13:17 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Kai Huang, Ohad Ben-Cohen, iommu, linux-omap, Laurent Pinchart,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho, kvm

Okay, seperate thread for this one.

On Thu, Nov 10, 2011 at 07:28:39PM +0000, David Woodhouse wrote:
> > The plan is to have a single DMA-API implementation for all IOMMU
> > drivers (X86 and ARM) which just uses the IOMMU-API. But to make this
> > performing reasonalbly well a few changes to the IOMMU-API are required.
> > I already have some ideas which we can discuss if you want.
> 
> Yeah, that sounds useful.

As I said some changes to the IOMMU-API are required in my opinion.
These changes should also allow it to move over old-style IOMMUs like
Calgary or GART later.

The basic idea is that IOMMU drivers should be required to put every
device they are responsible for into a default domain. The DMA mapping
code can query this default domain for each device.

Also the default domain has capabilities that can be queried. Those
capabilities include the size and offset of the address space they can
re-map. For GART and Calgary this will be the aperture, for VT-d and AMD
IOMMU the whole 64bit address space. Another capability is whether
addresses outside of that area are 1-1 mapped or no accessible to the
device.

The generic DMA-mapping code will use that information to initialize its
allocator and uses iommu_map/iommu_unmap to create and destroy mappings
as requested by the DMA-API (but the DMA-mapping code does not need to
create a domain of its own).

The good thing about these default domains is that IOMMU drivers can
implement their own optimizations on it. The AMD IOMMU driver for
example already makes a distinction between dma-mapping domains and
other protection-domains. The optimization for dma-mapping domains is
that the leaf-pages of the page-table are keept in an array so that it
is very easy to find the PTE for an address. Those optimizations are
still possible with the default-domain concept.

In short, the benefits of the default-domain concept are:

	1) It allows existing optimizations for the DMA-mapping code
	   paths to persist
	2) It also fits old-style IOMMUs like GART, Calgary and others

An open problem is how to report reserved ranges of an address-space.
These ranges might exist from a BIOS requirement for 1-1 mapping of
certain address ranges (in AMD jargon: Unity mapped ranges, something
similar exists on VT-d afaik) or hardware requirements like the reserved
address range used for MSI interrupts.

Regards,

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Changing IOMMU-API for generic DMA-mapping supported by the hardware
@ 2011-11-11 13:17             ` Joerg Roedel
  0 siblings, 0 replies; 81+ messages in thread
From: Joerg Roedel @ 2011-11-11 13:17 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Kai Huang, Ohad Ben-Cohen, iommu, linux-omap, Laurent Pinchart,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho, kvm

Okay, seperate thread for this one.

On Thu, Nov 10, 2011 at 07:28:39PM +0000, David Woodhouse wrote:
> > The plan is to have a single DMA-API implementation for all IOMMU
> > drivers (X86 and ARM) which just uses the IOMMU-API. But to make this
> > performing reasonalbly well a few changes to the IOMMU-API are required.
> > I already have some ideas which we can discuss if you want.
> 
> Yeah, that sounds useful.

As I said some changes to the IOMMU-API are required in my opinion.
These changes should also allow it to move over old-style IOMMUs like
Calgary or GART later.

The basic idea is that IOMMU drivers should be required to put every
device they are responsible for into a default domain. The DMA mapping
code can query this default domain for each device.

Also the default domain has capabilities that can be queried. Those
capabilities include the size and offset of the address space they can
re-map. For GART and Calgary this will be the aperture, for VT-d and AMD
IOMMU the whole 64bit address space. Another capability is whether
addresses outside of that area are 1-1 mapped or no accessible to the
device.

The generic DMA-mapping code will use that information to initialize its
allocator and uses iommu_map/iommu_unmap to create and destroy mappings
as requested by the DMA-API (but the DMA-mapping code does not need to
create a domain of its own).

The good thing about these default domains is that IOMMU drivers can
implement their own optimizations on it. The AMD IOMMU driver for
example already makes a distinction between dma-mapping domains and
other protection-domains. The optimization for dma-mapping domains is
that the leaf-pages of the page-table are keept in an array so that it
is very easy to find the PTE for an address. Those optimizations are
still possible with the default-domain concept.

In short, the benefits of the default-domain concept are:

	1) It allows existing optimizations for the DMA-mapping code
	   paths to persist
	2) It also fits old-style IOMMUs like GART, Calgary and others

An open problem is how to report reserved ranges of an address-space.
These ranges might exist from a BIOS requirement for 1-1 mapping of
certain address ranges (in AMD jargon: Unity mapped ranges, something
similar exists on VT-d afaik) or hardware requirements like the reserved
address range used for MSI interrupts.

Regards,

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
  2011-11-10 21:12           ` Stepan Moskovchenko
  (?)
@ 2011-11-11 13:24             ` Joerg Roedel
  -1 siblings, 0 replies; 81+ messages in thread
From: Joerg Roedel @ 2011-11-11 13:24 UTC (permalink / raw)
  To: Stepan Moskovchenko
  Cc: David Woodhouse, Kai Huang, Ohad Ben-Cohen, iommu, linux-omap,
	Laurent Pinchart, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Hiroshi Doyu, KyongHo Cho, kvm

On Thu, Nov 10, 2011 at 01:12:00PM -0800, Stepan Moskovchenko wrote:
> I have been experimenting with an iommu_map_range call, which maps a
> given scatterlist of discontiguous physical pages into a contiguous
> virtual region at a given IOVA. This has some performance advantages
> over just calling iommu_map iteratively. First, it reduces the
> amount of table walking / calculation needed for mapping each page,
> given how you know that all the pages will be mapped into a single
> virtually-contiguous region (so in most cases, the first-level table
> calculation can be reused). Second, it allows one to defer the TLB
> (and sometimes cache) maintenance operations until the entire
> scatterlist has been mapped, rather than doing a TLB invalidate
> after mapping each page, as would have been the case if iommu_map
> were just being called from within a loop. Granted, just using
> iommu_map many times may be acceptable on the slow path, but I have
> seen significant performance gains when using this approach on the
> fast path.

Yes, from a performance point-of-view that makes sense, as an addition
to the existing iommu_map interface. Are the pages in the list allowed
to have different page-sizes?


	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-11-11 13:24             ` Joerg Roedel
  0 siblings, 0 replies; 81+ messages in thread
From: Joerg Roedel @ 2011-11-11 13:24 UTC (permalink / raw)
  To: Stepan Moskovchenko
  Cc: David Woodhouse, Kai Huang, Ohad Ben-Cohen, iommu, linux-omap,
	Laurent Pinchart, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Hiroshi Doyu, KyongHo Cho, kvm

On Thu, Nov 10, 2011 at 01:12:00PM -0800, Stepan Moskovchenko wrote:
> I have been experimenting with an iommu_map_range call, which maps a
> given scatterlist of discontiguous physical pages into a contiguous
> virtual region at a given IOVA. This has some performance advantages
> over just calling iommu_map iteratively. First, it reduces the
> amount of table walking / calculation needed for mapping each page,
> given how you know that all the pages will be mapped into a single
> virtually-contiguous region (so in most cases, the first-level table
> calculation can be reused). Second, it allows one to defer the TLB
> (and sometimes cache) maintenance operations until the entire
> scatterlist has been mapped, rather than doing a TLB invalidate
> after mapping each page, as would have been the case if iommu_map
> were just being called from within a loop. Granted, just using
> iommu_map many times may be acceptable on the slow path, but I have
> seen significant performance gains when using this approach on the
> fast path.

Yes, from a performance point-of-view that makes sense, as an addition
to the existing iommu_map interface. Are the pages in the list allowed
to have different page-sizes?


	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-11-11 13:24             ` Joerg Roedel
  0 siblings, 0 replies; 81+ messages in thread
From: Joerg Roedel @ 2011-11-11 13:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 10, 2011 at 01:12:00PM -0800, Stepan Moskovchenko wrote:
> I have been experimenting with an iommu_map_range call, which maps a
> given scatterlist of discontiguous physical pages into a contiguous
> virtual region at a given IOVA. This has some performance advantages
> over just calling iommu_map iteratively. First, it reduces the
> amount of table walking / calculation needed for mapping each page,
> given how you know that all the pages will be mapped into a single
> virtually-contiguous region (so in most cases, the first-level table
> calculation can be reused). Second, it allows one to defer the TLB
> (and sometimes cache) maintenance operations until the entire
> scatterlist has been mapped, rather than doing a TLB invalidate
> after mapping each page, as would have been the case if iommu_map
> were just being called from within a loop. Granted, just using
> iommu_map many times may be acceptable on the slow path, but I have
> seen significant performance gains when using this approach on the
> fast path.

Yes, from a performance point-of-view that makes sense, as an addition
to the existing iommu_map interface. Are the pages in the list allowed
to have different page-sizes?


	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
  2011-11-11 12:58             ` Joerg Roedel
@ 2011-11-11 13:27               ` David Woodhouse
  -1 siblings, 0 replies; 81+ messages in thread
From: David Woodhouse @ 2011-11-11 13:27 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Kai Huang, Ohad Ben-Cohen, iommu, linux-omap, Laurent Pinchart,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho, kvm

[-- Attachment #1: Type: text/plain, Size: 3623 bytes --]

On Fri, 2011-11-11 at 13:58 +0100, Joerg Roedel wrote:
> For AMD IOMMU there is a feature called not-present cache. It says that
> the IOMMU caches non-present entries as well and needs an IOTLB flush
> when something is mapped (meant for software implementations of the
> IOMMU).
> So it can't be really taken out of the fast-path. But the IOMMU driver
> can optimize the function so that it only flushes the IOTLB when there
> was an unmap-call before. 

We have exactly the same situation with the Intel IOMMU (we call it
'Caching Mode') for the same reasons.

I'd be wary about making the IOMMU driver *track* whether there was an
unmap call before — that seems like hard work and more cache contention,
especially if the ->commit() call happens on a CPU other than the one
that just did the unmap.

I'm also not sure exactly when you'd call the ->commit() function when
the DMA API is being used, and which 'side' of that API the
deferred-flush optimisations would live.

Would the optimisation be done on the generic side, only calling
->commit when it absolutely *has* to happen? (Or periodically after
unmaps have happened to avoid entries hanging around for ever?)

Or would the optimisation be done in the IOMMU driver, thus turning the
->commit() function into more of a *hint*? You could add a 'simon_says'
boolean argument to it, I suppose...?

> It is also an improvement over the current
> situation where every iommu_unmap call results in a flush implicitly.
> This pretty much a no-go for using IOMMU-API in DMA mapping at the
> moment.

Right. That definitely needs to be handled. We just need to work out the
(above and other) details.

> > But also, it's not *so* much of an issue to divide the space up even
> > when it's limited. The idea was not to have it *strictly* per-CPU, but
> > just for a CPU to try allocating from "its own" subrange first…
> 
> Yeah, I get the idea. I fear that the memory consumption will get pretty
> high with that approach. It basically means one round-robin allocator
> per cpu and device. What does that mean on a 4096 CPU machine :)

Well, if your network device is taking interrupts, and mapping/unmapping
buffers across all 4096 CPUs, then your performance is screwed anyway :)

Certainly your concerns are valid, but I think we can cope with them
fairly reasonably. If we *do* have large number of CPUs allocating for a
given domain, we can move to a per-node rather than per-CPU allocator.
And we can have dynamically sized allocation regions, so we aren't
wasting too much space on unused bitmaps if you map just *one* page from
each of your 4096 CPUs.

> How much lock contention will be lowered also depends on the work-load.
> If dma-handles are frequently freed from another cpu than they were
> allocated from the same problem re-appears.

The idea is that dma handles are *infrequently* freed, in batches. So
we'll bounce the lock's cache line occasionally, but not all the time.

In "strict" or "unmap_flush" mode, you get to go slowly unless you do
the unmap on the same CPU that you mapped it from. I can live with that.

> But in the end we have to try it out and see what works best :)

Indeed. I'm just trying to work out if I should try to do the allocator
thing purely inside the Intel code first, and then try to move it out
and make it generic — or if I should start with making the DMA API work
with a wrapper around the IOMMU API, with your ->commit() and other
necessary changes. I think I'd prefer the latter, if we can work out how
it should look.

-- 
dwmw2

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5818 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-11-11 13:27               ` David Woodhouse
  0 siblings, 0 replies; 81+ messages in thread
From: David Woodhouse @ 2011-11-11 13:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 2011-11-11 at 13:58 +0100, Joerg Roedel wrote:
> For AMD IOMMU there is a feature called not-present cache. It says that
> the IOMMU caches non-present entries as well and needs an IOTLB flush
> when something is mapped (meant for software implementations of the
> IOMMU).
> So it can't be really taken out of the fast-path. But the IOMMU driver
> can optimize the function so that it only flushes the IOTLB when there
> was an unmap-call before. 

We have exactly the same situation with the Intel IOMMU (we call it
'Caching Mode') for the same reasons.

I'd be wary about making the IOMMU driver *track* whether there was an
unmap call before ? that seems like hard work and more cache contention,
especially if the ->commit() call happens on a CPU other than the one
that just did the unmap.

I'm also not sure exactly when you'd call the ->commit() function when
the DMA API is being used, and which 'side' of that API the
deferred-flush optimisations would live.

Would the optimisation be done on the generic side, only calling
->commit when it absolutely *has* to happen? (Or periodically after
unmaps have happened to avoid entries hanging around for ever?)

Or would the optimisation be done in the IOMMU driver, thus turning the
->commit() function into more of a *hint*? You could add a 'simon_says'
boolean argument to it, I suppose...?

> It is also an improvement over the current
> situation where every iommu_unmap call results in a flush implicitly.
> This pretty much a no-go for using IOMMU-API in DMA mapping at the
> moment.

Right. That definitely needs to be handled. We just need to work out the
(above and other) details.

> > But also, it's not *so* much of an issue to divide the space up even
> > when it's limited. The idea was not to have it *strictly* per-CPU, but
> > just for a CPU to try allocating from "its own" subrange first?
> 
> Yeah, I get the idea. I fear that the memory consumption will get pretty
> high with that approach. It basically means one round-robin allocator
> per cpu and device. What does that mean on a 4096 CPU machine :)

Well, if your network device is taking interrupts, and mapping/unmapping
buffers across all 4096 CPUs, then your performance is screwed anyway :)

Certainly your concerns are valid, but I think we can cope with them
fairly reasonably. If we *do* have large number of CPUs allocating for a
given domain, we can move to a per-node rather than per-CPU allocator.
And we can have dynamically sized allocation regions, so we aren't
wasting too much space on unused bitmaps if you map just *one* page from
each of your 4096 CPUs.

> How much lock contention will be lowered also depends on the work-load.
> If dma-handles are frequently freed from another cpu than they were
> allocated from the same problem re-appears.

The idea is that dma handles are *infrequently* freed, in batches. So
we'll bounce the lock's cache line occasionally, but not all the time.

In "strict" or "unmap_flush" mode, you get to go slowly unless you do
the unmap on the same CPU that you mapped it from. I can live with that.

> But in the end we have to try it out and see what works best :)

Indeed. I'm just trying to work out if I should try to do the allocator
thing purely inside the Intel code first, and then try to move it out
and make it generic ? or if I should start with making the DMA API work
with a wrapper around the IOMMU API, with your ->commit() and other
necessary changes. I think I'd prefer the latter, if we can work out how
it should look.

-- 
dwmw2
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5818 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20111111/52150256/attachment-0001.bin>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
  2011-11-11 13:27               ` David Woodhouse
  (?)
@ 2011-11-11 14:18                 ` Joerg Roedel
  -1 siblings, 0 replies; 81+ messages in thread
From: Joerg Roedel @ 2011-11-11 14:18 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Kai Huang, Ohad Ben-Cohen, iommu, linux-omap, Laurent Pinchart,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho, kvm

On Fri, Nov 11, 2011 at 01:27:28PM +0000, David Woodhouse wrote:
> On Fri, 2011-11-11 at 13:58 +0100, Joerg Roedel wrote:
> > For AMD IOMMU there is a feature called not-present cache. It says that
> > the IOMMU caches non-present entries as well and needs an IOTLB flush
> > when something is mapped (meant for software implementations of the
> > IOMMU).
> > So it can't be really taken out of the fast-path. But the IOMMU driver
> > can optimize the function so that it only flushes the IOTLB when there
> > was an unmap-call before. 
> 
> We have exactly the same situation with the Intel IOMMU (we call it
> 'Caching Mode') for the same reasons.
> 
> I'd be wary about making the IOMMU driver *track* whether there was an
> unmap call before — that seems like hard work and more cache contention,
> especially if the ->commit() call happens on a CPU other than the one
> that just did the unmap.

Well, the IOMMU driver does not need to track this globally or
per-IOMMU. It basically tracks it per-domain. This lowers the
cache-bouncing a bit.

> I'm also not sure exactly when you'd call the ->commit() function when
> the DMA API is being used, and which 'side' of that API the
> deferred-flush optimisations would live.

The commit function is called every time before one of the DMA-API
functions return.

> Would the optimisation be done on the generic side, only calling
> ->commit when it absolutely *has* to happen? (Or periodically after
> unmaps have happened to avoid entries hanging around for ever?)
> 
> Or would the optimisation be done in the IOMMU driver, thus turning the
> ->commit() function into more of a *hint*? You could add a 'simon_says'
> boolean argument to it, I suppose...?

The IOMMU driver can't do that because special knowledge about the
address allocator is needed. So it has to happen in the generic part.
Okay, that optimization does not really work with the ->commit() idea,
as I think about it. With the requirement to call commit() on the
map-side would result in too many flushes to happen.

So it may be better to just call it iommu_flush_tlb(domain) or
something. The user of the IOMMU-API can assume that the flush is only
necessary on the unmap-path. Stricter flushing rules need to be handled
in the IOMMU driver itself.

> > It is also an improvement over the current
> > situation where every iommu_unmap call results in a flush implicitly.
> > This pretty much a no-go for using IOMMU-API in DMA mapping at the
> > moment.
> 
> Right. That definitely needs to be handled. We just need to work out the
> (above and other) details.

Yes, I am convinced now that the commit() idea is not the right solution
:)
Some less generic semantics like a iommu_flush_tlb call-back would do
better with such an optimization.

> Certainly your concerns are valid, but I think we can cope with them
> fairly reasonably. If we *do* have large number of CPUs allocating for a
> given domain, we can move to a per-node rather than per-CPU allocator.
> And we can have dynamically sized allocation regions, so we aren't
> wasting too much space on unused bitmaps if you map just *one* page from
> each of your 4096 CPUs.

It also requires somewhat that the chunks are reasonably small. On a
typical small system the amount of allocated DMA memory is rather small
(GPUs are the exception, of course).

Btw, do you have numbers how much time is spent in spinlocks for
multi-node machines and the VT-d driver for some workloads? I remember
that I did this measurement some time ago on a 24 core AMD machine with
netperf on a 10GBit network card and the time spent spinning came out
at ~5%.
For 1Gbit network and disk-load it was around 0.5% on that machine.

> > How much lock contention will be lowered also depends on the work-load.
> > If dma-handles are frequently freed from another cpu than they were
> > allocated from the same problem re-appears.
> 
> The idea is that dma handles are *infrequently* freed, in batches. So
> we'll bounce the lock's cache line occasionally, but not all the time.

Hmm, btw, how does this batch-freeing works? Is the freeing done when
the address space is filled up or is another approach used?

> > But in the end we have to try it out and see what works best :)
> 
> Indeed. I'm just trying to work out if I should try to do the allocator
> thing purely inside the Intel code first, and then try to move it out
> and make it generic — or if I should start with making the DMA API work
> with a wrapper around the IOMMU API, with your ->commit() and other
> necessary changes. I think I'd prefer the latter, if we can work out how
> it should look.

I think it is the best to start with a generic DMA-mapping
implementation, benchmark it and then find the bottlenecks and try out
possible optimizations. In the beginning it is hard enough to come up
with an implementation that fits X86 and ARM at the same time. When this
is worked out we can continue to optimize :)

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-11-11 14:18                 ` Joerg Roedel
  0 siblings, 0 replies; 81+ messages in thread
From: Joerg Roedel @ 2011-11-11 14:18 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Kai Huang, Ohad Ben-Cohen, iommu, linux-omap, Laurent Pinchart,
	linux-arm-kernel, David Brown, Arnd Bergmann, linux-kernel,
	Hiroshi Doyu, Stepan Moskovchenko, KyongHo Cho, kvm

On Fri, Nov 11, 2011 at 01:27:28PM +0000, David Woodhouse wrote:
> On Fri, 2011-11-11 at 13:58 +0100, Joerg Roedel wrote:
> > For AMD IOMMU there is a feature called not-present cache. It says that
> > the IOMMU caches non-present entries as well and needs an IOTLB flush
> > when something is mapped (meant for software implementations of the
> > IOMMU).
> > So it can't be really taken out of the fast-path. But the IOMMU driver
> > can optimize the function so that it only flushes the IOTLB when there
> > was an unmap-call before. 
> 
> We have exactly the same situation with the Intel IOMMU (we call it
> 'Caching Mode') for the same reasons.
> 
> I'd be wary about making the IOMMU driver *track* whether there was an
> unmap call before — that seems like hard work and more cache contention,
> especially if the ->commit() call happens on a CPU other than the one
> that just did the unmap.

Well, the IOMMU driver does not need to track this globally or
per-IOMMU. It basically tracks it per-domain. This lowers the
cache-bouncing a bit.

> I'm also not sure exactly when you'd call the ->commit() function when
> the DMA API is being used, and which 'side' of that API the
> deferred-flush optimisations would live.

The commit function is called every time before one of the DMA-API
functions return.

> Would the optimisation be done on the generic side, only calling
> ->commit when it absolutely *has* to happen? (Or periodically after
> unmaps have happened to avoid entries hanging around for ever?)
> 
> Or would the optimisation be done in the IOMMU driver, thus turning the
> ->commit() function into more of a *hint*? You could add a 'simon_says'
> boolean argument to it, I suppose...?

The IOMMU driver can't do that because special knowledge about the
address allocator is needed. So it has to happen in the generic part.
Okay, that optimization does not really work with the ->commit() idea,
as I think about it. With the requirement to call commit() on the
map-side would result in too many flushes to happen.

So it may be better to just call it iommu_flush_tlb(domain) or
something. The user of the IOMMU-API can assume that the flush is only
necessary on the unmap-path. Stricter flushing rules need to be handled
in the IOMMU driver itself.

> > It is also an improvement over the current
> > situation where every iommu_unmap call results in a flush implicitly.
> > This pretty much a no-go for using IOMMU-API in DMA mapping at the
> > moment.
> 
> Right. That definitely needs to be handled. We just need to work out the
> (above and other) details.

Yes, I am convinced now that the commit() idea is not the right solution
:)
Some less generic semantics like a iommu_flush_tlb call-back would do
better with such an optimization.

> Certainly your concerns are valid, but I think we can cope with them
> fairly reasonably. If we *do* have large number of CPUs allocating for a
> given domain, we can move to a per-node rather than per-CPU allocator.
> And we can have dynamically sized allocation regions, so we aren't
> wasting too much space on unused bitmaps if you map just *one* page from
> each of your 4096 CPUs.

It also requires somewhat that the chunks are reasonably small. On a
typical small system the amount of allocated DMA memory is rather small
(GPUs are the exception, of course).

Btw, do you have numbers how much time is spent in spinlocks for
multi-node machines and the VT-d driver for some workloads? I remember
that I did this measurement some time ago on a 24 core AMD machine with
netperf on a 10GBit network card and the time spent spinning came out
at ~5%.
For 1Gbit network and disk-load it was around 0.5% on that machine.

> > How much lock contention will be lowered also depends on the work-load.
> > If dma-handles are frequently freed from another cpu than they were
> > allocated from the same problem re-appears.
> 
> The idea is that dma handles are *infrequently* freed, in batches. So
> we'll bounce the lock's cache line occasionally, but not all the time.

Hmm, btw, how does this batch-freeing works? Is the freeing done when
the address space is filled up or is another approach used?

> > But in the end we have to try it out and see what works best :)
> 
> Indeed. I'm just trying to work out if I should try to do the allocator
> thing purely inside the Intel code first, and then try to move it out
> and make it generic — or if I should start with making the DMA API work
> with a wrapper around the IOMMU API, with your ->commit() and other
> necessary changes. I think I'd prefer the latter, if we can work out how
> it should look.

I think it is the best to start with a generic DMA-mapping
implementation, benchmark it and then find the bottlenecks and try out
possible optimizations. In the beginning it is hard enough to come up
with an implementation that fits X86 and ARM at the same time. When this
is worked out we can continue to optimize :)

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-11-11 14:18                 ` Joerg Roedel
  0 siblings, 0 replies; 81+ messages in thread
From: Joerg Roedel @ 2011-11-11 14:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Nov 11, 2011 at 01:27:28PM +0000, David Woodhouse wrote:
> On Fri, 2011-11-11 at 13:58 +0100, Joerg Roedel wrote:
> > For AMD IOMMU there is a feature called not-present cache. It says that
> > the IOMMU caches non-present entries as well and needs an IOTLB flush
> > when something is mapped (meant for software implementations of the
> > IOMMU).
> > So it can't be really taken out of the fast-path. But the IOMMU driver
> > can optimize the function so that it only flushes the IOTLB when there
> > was an unmap-call before. 
> 
> We have exactly the same situation with the Intel IOMMU (we call it
> 'Caching Mode') for the same reasons.
> 
> I'd be wary about making the IOMMU driver *track* whether there was an
> unmap call before ? that seems like hard work and more cache contention,
> especially if the ->commit() call happens on a CPU other than the one
> that just did the unmap.

Well, the IOMMU driver does not need to track this globally or
per-IOMMU. It basically tracks it per-domain. This lowers the
cache-bouncing a bit.

> I'm also not sure exactly when you'd call the ->commit() function when
> the DMA API is being used, and which 'side' of that API the
> deferred-flush optimisations would live.

The commit function is called every time before one of the DMA-API
functions return.

> Would the optimisation be done on the generic side, only calling
> ->commit when it absolutely *has* to happen? (Or periodically after
> unmaps have happened to avoid entries hanging around for ever?)
> 
> Or would the optimisation be done in the IOMMU driver, thus turning the
> ->commit() function into more of a *hint*? You could add a 'simon_says'
> boolean argument to it, I suppose...?

The IOMMU driver can't do that because special knowledge about the
address allocator is needed. So it has to happen in the generic part.
Okay, that optimization does not really work with the ->commit() idea,
as I think about it. With the requirement to call commit() on the
map-side would result in too many flushes to happen.

So it may be better to just call it iommu_flush_tlb(domain) or
something. The user of the IOMMU-API can assume that the flush is only
necessary on the unmap-path. Stricter flushing rules need to be handled
in the IOMMU driver itself.

> > It is also an improvement over the current
> > situation where every iommu_unmap call results in a flush implicitly.
> > This pretty much a no-go for using IOMMU-API in DMA mapping at the
> > moment.
> 
> Right. That definitely needs to be handled. We just need to work out the
> (above and other) details.

Yes, I am convinced now that the commit() idea is not the right solution
:)
Some less generic semantics like a iommu_flush_tlb call-back would do
better with such an optimization.

> Certainly your concerns are valid, but I think we can cope with them
> fairly reasonably. If we *do* have large number of CPUs allocating for a
> given domain, we can move to a per-node rather than per-CPU allocator.
> And we can have dynamically sized allocation regions, so we aren't
> wasting too much space on unused bitmaps if you map just *one* page from
> each of your 4096 CPUs.

It also requires somewhat that the chunks are reasonably small. On a
typical small system the amount of allocated DMA memory is rather small
(GPUs are the exception, of course).

Btw, do you have numbers how much time is spent in spinlocks for
multi-node machines and the VT-d driver for some workloads? I remember
that I did this measurement some time ago on a 24 core AMD machine with
netperf on a 10GBit network card and the time spent spinning came out
at ~5%.
For 1Gbit network and disk-load it was around 0.5% on that machine.

> > How much lock contention will be lowered also depends on the work-load.
> > If dma-handles are frequently freed from another cpu than they were
> > allocated from the same problem re-appears.
> 
> The idea is that dma handles are *infrequently* freed, in batches. So
> we'll bounce the lock's cache line occasionally, but not all the time.

Hmm, btw, how does this batch-freeing works? Is the freeing done when
the address space is filled up or is another approach used?

> > But in the end we have to try it out and see what works best :)
> 
> Indeed. I'm just trying to work out if I should try to do the allocator
> thing purely inside the Intel code first, and then try to move it out
> and make it generic ? or if I should start with making the DMA API work
> with a wrapper around the IOMMU API, with your ->commit() and other
> necessary changes. I think I'd prefer the latter, if we can work out how
> it should look.

I think it is the best to start with a generic DMA-mapping
implementation, benchmark it and then find the bottlenecks and try out
possible optimizations. In the beginning it is hard enough to come up
with an implementation that fits X86 and ARM at the same time. When this
is worked out we can continue to optimize :)

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
  2011-11-11 13:24             ` Joerg Roedel
@ 2011-11-12  2:04               ` Stepan Moskovchenko
  -1 siblings, 0 replies; 81+ messages in thread
From: Stepan Moskovchenko @ 2011-11-12  2:04 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: David Woodhouse, Kai Huang, Ohad Ben-Cohen, iommu, linux-omap,
	Laurent Pinchart, linux-arm-kernel, David Brown, Arnd Bergmann,
	linux-kernel, Hiroshi Doyu, KyongHo Cho, kvm

On 11/11/2011 5:24 AM, Joerg Roedel wrote:
> On Thu, Nov 10, 2011 at 01:12:00PM -0800, Stepan Moskovchenko wrote:
>> I have been experimenting with an iommu_map_range call, which maps a
>> given scatterlist of discontiguous physical pages into a contiguous
>> virtual region at a given IOVA. This has some performance advantages
>> over just calling iommu_map iteratively. First, it reduces the
>> amount of table walking / calculation needed for mapping each page,
>> given how you know that all the pages will be mapped into a single
>> virtually-contiguous region (so in most cases, the first-level table
>> calculation can be reused). Second, it allows one to defer the TLB
>> (and sometimes cache) maintenance operations until the entire
>> scatterlist has been mapped, rather than doing a TLB invalidate
>> after mapping each page, as would have been the case if iommu_map
>> were just being called from within a loop. Granted, just using
>> iommu_map many times may be acceptable on the slow path, but I have
>> seen significant performance gains when using this approach on the
>> fast path.
> Yes, from a performance point-of-view that makes sense, as an addition
> to the existing iommu_map interface. Are the pages in the list allowed
> to have different page-sizes?
>
>
> 	Joerg
>

Hello

Yes, the scatterlist is allowed to have different page sizes. But, they 
are required to have a length that is a multiple of 4K. If a page in the 
list is bigger than 4K, the code will iteratively map it with 4K pages. 
I suppose based on how my implementation is written, it would not be too 
difficult to add checks for the proper length and VA/PA alignments, and 
insert a 64K / 1M / 16M mapping if the alignment is lucky and the SG 
item is big enough.

In my particular test case, even though the pages in the list might be 
of different sizes, they are not guaranteed to be aligned properly and I 
would most likely have to fall back on mapping them as multiple 
consecutive 4K pages, anyway. But even despite this, having map_range to 
consolidate a lot of the common operations into one call sill gives me a 
nice speed boost.

I hadn't sent the patches out because this was all for my testing, but 
would you be interested in me adding a map_range to the API? The 
iommu_map_range call could even do a check if the ops supports a 
.map_range, and fall back on calling iommu_map repeatedly if the driver 
doesn't support this operation natively. In my code, the function takes 
a domain, iova, scatterlist, length, and prot.

Steve

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-11-12  2:04               ` Stepan Moskovchenko
  0 siblings, 0 replies; 81+ messages in thread
From: Stepan Moskovchenko @ 2011-11-12  2:04 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/11/2011 5:24 AM, Joerg Roedel wrote:
> On Thu, Nov 10, 2011 at 01:12:00PM -0800, Stepan Moskovchenko wrote:
>> I have been experimenting with an iommu_map_range call, which maps a
>> given scatterlist of discontiguous physical pages into a contiguous
>> virtual region at a given IOVA. This has some performance advantages
>> over just calling iommu_map iteratively. First, it reduces the
>> amount of table walking / calculation needed for mapping each page,
>> given how you know that all the pages will be mapped into a single
>> virtually-contiguous region (so in most cases, the first-level table
>> calculation can be reused). Second, it allows one to defer the TLB
>> (and sometimes cache) maintenance operations until the entire
>> scatterlist has been mapped, rather than doing a TLB invalidate
>> after mapping each page, as would have been the case if iommu_map
>> were just being called from within a loop. Granted, just using
>> iommu_map many times may be acceptable on the slow path, but I have
>> seen significant performance gains when using this approach on the
>> fast path.
> Yes, from a performance point-of-view that makes sense, as an addition
> to the existing iommu_map interface. Are the pages in the list allowed
> to have different page-sizes?
>
>
> 	Joerg
>

Hello

Yes, the scatterlist is allowed to have different page sizes. But, they 
are required to have a length that is a multiple of 4K. If a page in the 
list is bigger than 4K, the code will iteratively map it with 4K pages. 
I suppose based on how my implementation is written, it would not be too 
difficult to add checks for the proper length and VA/PA alignments, and 
insert a 64K / 1M / 16M mapping if the alignment is lucky and the SG 
item is big enough.

In my particular test case, even though the pages in the list might be 
of different sizes, they are not guaranteed to be aligned properly and I 
would most likely have to fall back on mapping them as multiple 
consecutive 4K pages, anyway. But even despite this, having map_range to 
consolidate a lot of the common operations into one call sill gives me a 
nice speed boost.

I hadn't sent the patches out because this was all for my testing, but 
would you be interested in me adding a map_range to the API? The 
iommu_map_range call could even do a check if the ops supports a 
.map_range, and fall back on calling iommu_map repeatedly if the driver 
doesn't support this operation natively. In my code, the function takes 
a domain, iova, scatterlist, length, and prot.

Steve

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
  2011-11-12  2:04               ` Stepan Moskovchenko
@ 2011-11-13  1:43                 ` KyongHo Cho
  -1 siblings, 0 replies; 81+ messages in thread
From: KyongHo Cho @ 2011-11-13  1:43 UTC (permalink / raw)
  To: Stepan Moskovchenko
  Cc: Joerg Roedel, Ohad Ben-Cohen, Kai Huang, kvm, Arnd Bergmann,
	linux-kernel, iommu, Laurent Pinchart, David Brown, linux-omap,
	David Woodhouse, linux-arm-kernel

On Sat, Nov 12, 2011 at 11:04 AM, Stepan Moskovchenko
<stepanm@codeaurora.org> wrote:
> Yes, the scatterlist is allowed to have different page sizes. But, they are
> required to have a length that is a multiple of 4K. If a page in the list is

An element in scatterlist may not be aligned by 4K but I think IOMMU driver
must round it up to 4K if IOMMU API support iommu_map_range().

The first element in scatterlist is often not aligned by 4K if the pages are
provided from userspace. Length + offset of  the element must be
aligned by 4K, though.

Thank you,
KyongHo.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
@ 2011-11-13  1:43                 ` KyongHo Cho
  0 siblings, 0 replies; 81+ messages in thread
From: KyongHo Cho @ 2011-11-13  1:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Nov 12, 2011 at 11:04 AM, Stepan Moskovchenko
<stepanm@codeaurora.org> wrote:
> Yes, the scatterlist is allowed to have different page sizes. But, they are
> required to have a length that is a multiple of 4K. If a page in the list is

An element in scatterlist may not be aligned by 4K but I think IOMMU driver
must round it up to 4K if IOMMU API support iommu_map_range().

The first element in scatterlist is often not aligned by 4K if the pages are
provided from userspace. Length + offset of  the element must be
aligned by 4K, though.

Thank you,
KyongHo.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* RE: Changing IOMMU-API for generic DMA-mapping supported by the hardware
  2011-11-11 13:17             ` Joerg Roedel
@ 2011-11-24 12:52               ` Marek Szyprowski
  -1 siblings, 0 replies; 81+ messages in thread
From: Marek Szyprowski @ 2011-11-24 12:52 UTC (permalink / raw)
  To: 'Joerg Roedel', 'David Woodhouse'
  Cc: 'Ohad Ben-Cohen', 'KyongHo Cho',
	'Kai Huang', kvm, 'Arnd Bergmann',
	linux-kernel, iommu, 'Laurent Pinchart',
	'David Brown', linux-omap, 'Stepan Moskovchenko',
	linux-arm-kernel

Hello,

On Friday, November 11, 2011 2:17 PM Joerg Roedel wrote:

> Okay, seperate thread for this one.

If possible, I would like to be CCed: in the next mails in this topic. 

For a last few months I've been working on DMA-mapping changes on ARM
architecture in order to add support for IOMMU-aware DMA mapper. The
last version of my patches are available here:
http://lists.linaro.org/pipermail/linaro-mm-sig/2011-October/000745.html

The next version will be posted soon.
 
> On Thu, Nov 10, 2011 at 07:28:39PM +0000, David Woodhouse wrote:
> > > The plan is to have a single DMA-API implementation for all IOMMU
> > > drivers (X86 and ARM) which just uses the IOMMU-API. But to make this
> > > performing reasonalbly well a few changes to the IOMMU-API are required.
> > > I already have some ideas which we can discuss if you want.
> >
> > Yeah, that sounds useful.
> 
> As I said some changes to the IOMMU-API are required in my opinion.
> These changes should also allow it to move over old-style IOMMUs like
> Calgary or GART later.
> 
> The basic idea is that IOMMU drivers should be required to put every
> device they are responsible for into a default domain. The DMA mapping
> code can query this default domain for each device.

Good idea.
 
> Also the default domain has capabilities that can be queried. Those
> capabilities include the size and offset of the address space they can
> re-map. For GART and Calgary this will be the aperture, for VT-d and AMD
> IOMMU the whole 64bit address space. Another capability is whether
> addresses outside of that area are 1-1 mapped or no accessible to the
> device.
>
> The generic DMA-mapping code will use that information to initialize its
> allocator and uses iommu_map/iommu_unmap to create and destroy mappings
> as requested by the DMA-API (but the DMA-mapping code does not need to
> create a domain of its own).
> 
> The good thing about these default domains is that IOMMU drivers can
> implement their own optimizations on it. The AMD IOMMU driver for
> example already makes a distinction between dma-mapping domains and
> other protection-domains. The optimization for dma-mapping domains is
> that the leaf-pages of the page-table are keept in an array so that it
> is very easy to find the PTE for an address. Those optimizations are
> still possible with the default-domain concept.
> 
> In short, the benefits of the default-domain concept are:
> 
> 	1) It allows existing optimizations for the DMA-mapping code
> 	   paths to persist
> 	2) It also fits old-style IOMMUs like GART, Calgary and others
> 
> An open problem is how to report reserved ranges of an address-space.
> These ranges might exist from a BIOS requirement for 1-1 mapping of
> certain address ranges (in AMD jargon: Unity mapped ranges, something
> similar exists on VT-d afaik) or hardware requirements like the reserved
> address range used for MSI interrupts.

In my DMA-mapping IOMMU integration I've used a dma_iommu_mapping structure,
which contains a pointer to iommu domain, a bitmap and a lock. Maybe we 
should consider extending iommu domain with allocation bitmap (or other 
structure that hold information about used/unused iova ranges)? From the
DMA-mapping (as a IOMMU client) perspective we only need 2 more callbacks
in IOMMU API: alloc_iova_range() and free_iova_range(). 

Each IOMMU implementation can provide these calls based on internal bitmap
allocator which will also cover the issue with reserved ranges. What do you
think about such solution?

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center




^ permalink raw reply	[flat|nested] 81+ messages in thread

* Changing IOMMU-API for generic DMA-mapping supported by the hardware
@ 2011-11-24 12:52               ` Marek Szyprowski
  0 siblings, 0 replies; 81+ messages in thread
From: Marek Szyprowski @ 2011-11-24 12:52 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Friday, November 11, 2011 2:17 PM Joerg Roedel wrote:

> Okay, seperate thread for this one.

If possible, I would like to be CCed: in the next mails in this topic. 

For a last few months I've been working on DMA-mapping changes on ARM
architecture in order to add support for IOMMU-aware DMA mapper. The
last version of my patches are available here:
http://lists.linaro.org/pipermail/linaro-mm-sig/2011-October/000745.html

The next version will be posted soon.
 
> On Thu, Nov 10, 2011 at 07:28:39PM +0000, David Woodhouse wrote:
> > > The plan is to have a single DMA-API implementation for all IOMMU
> > > drivers (X86 and ARM) which just uses the IOMMU-API. But to make this
> > > performing reasonalbly well a few changes to the IOMMU-API are required.
> > > I already have some ideas which we can discuss if you want.
> >
> > Yeah, that sounds useful.
> 
> As I said some changes to the IOMMU-API are required in my opinion.
> These changes should also allow it to move over old-style IOMMUs like
> Calgary or GART later.
> 
> The basic idea is that IOMMU drivers should be required to put every
> device they are responsible for into a default domain. The DMA mapping
> code can query this default domain for each device.

Good idea.
 
> Also the default domain has capabilities that can be queried. Those
> capabilities include the size and offset of the address space they can
> re-map. For GART and Calgary this will be the aperture, for VT-d and AMD
> IOMMU the whole 64bit address space. Another capability is whether
> addresses outside of that area are 1-1 mapped or no accessible to the
> device.
>
> The generic DMA-mapping code will use that information to initialize its
> allocator and uses iommu_map/iommu_unmap to create and destroy mappings
> as requested by the DMA-API (but the DMA-mapping code does not need to
> create a domain of its own).
> 
> The good thing about these default domains is that IOMMU drivers can
> implement their own optimizations on it. The AMD IOMMU driver for
> example already makes a distinction between dma-mapping domains and
> other protection-domains. The optimization for dma-mapping domains is
> that the leaf-pages of the page-table are keept in an array so that it
> is very easy to find the PTE for an address. Those optimizations are
> still possible with the default-domain concept.
> 
> In short, the benefits of the default-domain concept are:
> 
> 	1) It allows existing optimizations for the DMA-mapping code
> 	   paths to persist
> 	2) It also fits old-style IOMMUs like GART, Calgary and others
> 
> An open problem is how to report reserved ranges of an address-space.
> These ranges might exist from a BIOS requirement for 1-1 mapping of
> certain address ranges (in AMD jargon: Unity mapped ranges, something
> similar exists on VT-d afaik) or hardware requirements like the reserved
> address range used for MSI interrupts.

In my DMA-mapping IOMMU integration I've used a dma_iommu_mapping structure,
which contains a pointer to iommu domain, a bitmap and a lock. Maybe we 
should consider extending iommu domain with allocation bitmap (or other 
structure that hold information about used/unused iova ranges)? From the
DMA-mapping (as a IOMMU client) perspective we only need 2 more callbacks
in IOMMU API: alloc_iova_range() and free_iova_range(). 

Each IOMMU implementation can provide these calls based on internal bitmap
allocator which will also cover the issue with reserved ranges. What do you
think about such solution?

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Changing IOMMU-API for generic DMA-mapping supported by the hardware
  2011-11-24 12:52               ` Marek Szyprowski
  (?)
@ 2011-11-24 15:27                 ` 'Joerg Roedel'
  -1 siblings, 0 replies; 81+ messages in thread
From: 'Joerg Roedel' @ 2011-11-24 15:27 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: 'David Woodhouse', 'Ohad Ben-Cohen',
	'KyongHo Cho', 'Kai Huang',
	kvm, 'Arnd Bergmann',
	linux-kernel, iommu, 'Laurent Pinchart',
	'David Brown', linux-omap, 'Stepan Moskovchenko',
	linux-arm-kernel

On Thu, Nov 24, 2011 at 01:52:33PM +0100, Marek Szyprowski wrote:
> In my DMA-mapping IOMMU integration I've used a dma_iommu_mapping structure,
> which contains a pointer to iommu domain, a bitmap and a lock. Maybe we 
> should consider extending iommu domain with allocation bitmap (or other 
> structure that hold information about used/unused iova ranges)? From the
> DMA-mapping (as a IOMMU client) perspective we only need 2 more callbacks
> in IOMMU API: alloc_iova_range() and free_iova_range(). 
> 
> Each IOMMU implementation can provide these calls based on internal bitmap
> allocator which will also cover the issue with reserved ranges. What do you
> think about such solution?

Hmm, the main point of a generic DMA-mapping implementation is that a
common address-allocator will be used. Today every IOMMU driver that
implements the DMA-API has its own allocator, this is something to unify
between for all drivers.

The allocator information can be stored in the default iommu_domain. We
need a user-private pointer there, but that is easy to add.


	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Changing IOMMU-API for generic DMA-mapping supported by the hardware
@ 2011-11-24 15:27                 ` 'Joerg Roedel'
  0 siblings, 0 replies; 81+ messages in thread
From: 'Joerg Roedel' @ 2011-11-24 15:27 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: 'David Woodhouse', 'Ohad Ben-Cohen',
	'KyongHo Cho', 'Kai Huang',
	kvm, 'Arnd Bergmann',
	linux-kernel, iommu, 'Laurent Pinchart',
	'David Brown', linux-omap, 'Stepan Moskovchenko',
	linux-arm-kernel

On Thu, Nov 24, 2011 at 01:52:33PM +0100, Marek Szyprowski wrote:
> In my DMA-mapping IOMMU integration I've used a dma_iommu_mapping structure,
> which contains a pointer to iommu domain, a bitmap and a lock. Maybe we 
> should consider extending iommu domain with allocation bitmap (or other 
> structure that hold information about used/unused iova ranges)? From the
> DMA-mapping (as a IOMMU client) perspective we only need 2 more callbacks
> in IOMMU API: alloc_iova_range() and free_iova_range(). 
> 
> Each IOMMU implementation can provide these calls based on internal bitmap
> allocator which will also cover the issue with reserved ranges. What do you
> think about such solution?

Hmm, the main point of a generic DMA-mapping implementation is that a
common address-allocator will be used. Today every IOMMU driver that
implements the DMA-API has its own allocator, this is something to unify
between for all drivers.

The allocator information can be stored in the default iommu_domain. We
need a user-private pointer there, but that is easy to add.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Changing IOMMU-API for generic DMA-mapping supported by the hardware
@ 2011-11-24 15:27                 ` 'Joerg Roedel'
  0 siblings, 0 replies; 81+ messages in thread
From: 'Joerg Roedel' @ 2011-11-24 15:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 24, 2011 at 01:52:33PM +0100, Marek Szyprowski wrote:
> In my DMA-mapping IOMMU integration I've used a dma_iommu_mapping structure,
> which contains a pointer to iommu domain, a bitmap and a lock. Maybe we 
> should consider extending iommu domain with allocation bitmap (or other 
> structure that hold information about used/unused iova ranges)? From the
> DMA-mapping (as a IOMMU client) perspective we only need 2 more callbacks
> in IOMMU API: alloc_iova_range() and free_iova_range(). 
> 
> Each IOMMU implementation can provide these calls based on internal bitmap
> allocator which will also cover the issue with reserved ranges. What do you
> think about such solution?

Hmm, the main point of a generic DMA-mapping implementation is that a
common address-allocator will be used. Today every IOMMU driver that
implements the DMA-API has its own allocator, this is something to unify
between for all drivers.

The allocator information can be stored in the default iommu_domain. We
need a user-private pointer there, but that is easy to add.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 81+ messages in thread

end of thread, other threads:[~2011-11-24 15:29 UTC | newest]

Thread overview: 81+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-17 11:27 [PATCH v4 0/7] iommu: split mapping to page sizes as supported by the hardware Ohad Ben-Cohen
2011-10-17 11:27 ` Ohad Ben-Cohen
2011-10-17 11:27 ` Ohad Ben-Cohen
2011-10-17 11:27 ` [PATCH v4 1/7] iommu/core: stop converting bytes to page order back and forth Ohad Ben-Cohen
2011-10-17 11:27   ` Ohad Ben-Cohen
2011-10-17 11:27   ` Ohad Ben-Cohen
2011-10-17 11:27 ` [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware Ohad Ben-Cohen
2011-10-17 11:27   ` Ohad Ben-Cohen
2011-10-17 11:27   ` Ohad Ben-Cohen
2011-11-10  6:17   ` Kai Huang
2011-11-10  6:17     ` Kai Huang
2011-11-10  7:31     ` Ohad Ben-Cohen
2011-11-10  7:31       ` Ohad Ben-Cohen
2011-11-10 12:16       ` cody
2011-11-10 12:16         ` cody
2011-11-10 13:08         ` Joerg Roedel
2011-11-10 13:08           ` Joerg Roedel
2011-11-10 13:08           ` Joerg Roedel
2011-11-10 14:35           ` cody
2011-11-10 14:35             ` cody
2011-11-10 14:51             ` Joerg Roedel
2011-11-10 14:51               ` Joerg Roedel
2011-11-10 14:51               ` Joerg Roedel
2011-11-10 15:28     ` David Woodhouse
2011-11-10 15:28       ` David Woodhouse
2011-11-10 17:09       ` Joerg Roedel
2011-11-10 17:09         ` Joerg Roedel
2011-11-10 17:09         ` Joerg Roedel
2011-11-10 19:28         ` David Woodhouse
2011-11-10 19:28           ` David Woodhouse
2011-11-11 12:58           ` Joerg Roedel
2011-11-11 12:58             ` Joerg Roedel
2011-11-11 12:58             ` Joerg Roedel
2011-11-11 13:27             ` David Woodhouse
2011-11-11 13:27               ` David Woodhouse
2011-11-11 14:18               ` Joerg Roedel
2011-11-11 14:18                 ` Joerg Roedel
2011-11-11 14:18                 ` Joerg Roedel
2011-11-11 13:17           ` Changing IOMMU-API for generic DMA-mapping " Joerg Roedel
2011-11-11 13:17             ` Joerg Roedel
2011-11-24 12:52             ` Marek Szyprowski
2011-11-24 12:52               ` Marek Szyprowski
2011-11-24 15:27               ` 'Joerg Roedel'
2011-11-24 15:27                 ` 'Joerg Roedel'
2011-11-24 15:27                 ` 'Joerg Roedel'
2011-11-10 21:12         ` [PATCH v4 2/7] iommu/core: split mapping to page sizes as " Stepan Moskovchenko
2011-11-10 21:12           ` Stepan Moskovchenko
2011-11-11 13:24           ` Joerg Roedel
2011-11-11 13:24             ` Joerg Roedel
2011-11-11 13:24             ` Joerg Roedel
2011-11-12  2:04             ` Stepan Moskovchenko
2011-11-12  2:04               ` Stepan Moskovchenko
2011-11-13  1:43               ` KyongHo Cho
2011-11-13  1:43                 ` KyongHo Cho
2011-10-17 11:27 ` [PATCH v4 3/7] iommu/omap: announce supported page sizes Ohad Ben-Cohen
2011-10-17 11:27   ` Ohad Ben-Cohen
2011-10-17 11:27   ` Ohad Ben-Cohen
2011-10-17 11:27 ` [PATCH v4 4/7] iommu/msm: " Ohad Ben-Cohen
2011-10-17 11:27   ` Ohad Ben-Cohen
2011-10-17 11:27   ` Ohad Ben-Cohen
2011-10-17 11:27 ` [PATCH v4 5/7] iommu/amd: " Ohad Ben-Cohen
2011-10-17 11:27   ` Ohad Ben-Cohen
2011-10-17 11:27   ` Ohad Ben-Cohen
2011-10-17 11:27 ` [PATCH v4 6/7] iommu/intel: " Ohad Ben-Cohen
2011-10-17 11:27   ` Ohad Ben-Cohen
2011-10-17 11:27   ` Ohad Ben-Cohen
2011-10-17 11:27 ` [PATCH v4 7/7] iommu/core: remove the temporary pgsize settings Ohad Ben-Cohen
2011-10-17 11:27   ` Ohad Ben-Cohen
2011-10-17 11:27   ` Ohad Ben-Cohen
2011-11-08 12:57 ` [PATCH v4 0/7] iommu: split mapping to page sizes as supported by the hardware Ohad Ben-Cohen
2011-11-08 12:57   ` Ohad Ben-Cohen
2011-11-08 14:01   ` Joerg Roedel
2011-11-08 14:01     ` Joerg Roedel
2011-11-08 14:01     ` Joerg Roedel
2011-11-08 14:03     ` Ohad Ben-Cohen
2011-11-08 14:03       ` Ohad Ben-Cohen
2011-11-08 16:23       ` Joerg Roedel
2011-11-08 16:23         ` Joerg Roedel
2011-11-08 16:23         ` Joerg Roedel
2011-11-08 16:43         ` Ohad Ben-Cohen
2011-11-08 16:43           ` Ohad Ben-Cohen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.